Toward Comprehensive Benchmarking of the Biological Knowledge of Frontier Large Language Models

Sunishchal Dev, Charles Teague, Kyle Brady, Ying-Chiang Jeffrey Lee, Sarah L. Gebauer, Henry Alexander Bradley, Grant Ellison, Bria Persaud, Jordan Despanie, Barbara Del Castello, et al.

Published Feb 10, 2025

Recent artificial intelligence (AI) systems demonstrate deep knowledge across a broad variety of scientific domains, some of which could potentially be misused for biological and chemical weapon development. Although most public general-purpose AI systems contain safety guardrails that refuse to assist with harmful tasks, these defenses are known to be vulnerable and prone to manipulation. Historical precedents show that nefarious groups are interested in developing biological and chemical weapons, but they are often prevented from doing so because of a lack of scientific expertise.

In this working paper, the authors describe an early component of a wider research effort analyzing the extent to which malicious actors with broad scientific knowledge may be able to use large language models through increasingly capable AI systems to develop biological or chemical weapons, including through modified systems lacking safety measures. The authors evaluated 31 of the most-capable models, as of January 2025, against public knowledge benchmarks — standardized tests or datasets used to evaluate the performance of different AI models on a specific task — relevant to biological and chemical threats. They tested the biological and chemical knowledge of the most cutting-edge large language models, with and without safety guardrails and with increased exposure to biological information.

Document Details

Citation

RAND Style Manual

Dev, Sunishchal, Charles Teague, Kyle Brady, Ying-Chiang Jeffrey Lee, Sarah L. Gebauer, Henry Alexander Bradley, Grant Ellison, Bria Persaud, Jordan Despanie, Barbara Del Castello, Alyssa Worland, Michael Miller, Dawid Maciorowski, Adrian Salas, Dave Nguyen, James Liu, Jason Johnson, Andrew Sloan, Will Stonehouse, Travis Merrill, Thomas Goode, Greg McKelvey, Jr., and Ella Guest, Toward Comprehensive Benchmarking of the Biological Knowledge of Frontier Large Language Models, RAND Corporation, WR-A3797-1, 2025. As of April 8, 2025: https://www.rand.org/pubs/working_papers/WRA3797-1.html

Chicago Manual of Style

Dev, Sunishchal, Charles Teague, Kyle Brady, Ying-Chiang Jeffrey Lee, Sarah L. Gebauer, Henry Alexander Bradley, Grant Ellison, Bria Persaud, Jordan Despanie, Barbara Del Castello, Alyssa Worland, Michael Miller, Dawid Maciorowski, Adrian Salas, Dave Nguyen, James Liu, Jason Johnson, Andrew Sloan, Will Stonehouse, Travis Merrill, Thomas Goode, Greg McKelvey, Jr., and Ella Guest, Toward Comprehensive Benchmarking of the Biological Knowledge of Frontier Large Language Models. Santa Monica, CA: RAND Corporation, 2025. https://www.rand.org/pubs/working_papers/WRA3797-1.html.
BibTeX RIS

Research conducted by

This work was independently initiated and conducted within the Technology and Security Policy Center of RAND Global and Emerging Risks using income from operations and gifts from philanthropic supporters. A complete list of donors and funders is available at www.rand.org/TASP.

This publication is part of the RAND working paper series. RAND working papers are intended to share researchers' latest findings and to solicit informal peer review. They have been approved for circulation by RAND but may not have been formally edited or peer reviewed.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.