How AI Can Automate AI Research and Development

Commentary

Oct 24, 2024

AI agent concept with AI in the center and other apps around the edges in the shape of a brain, image by Blue Planet Studio/Getty Images

Image by Blue Planet Studio/Getty Images

Technology companies are using AI itself to accelerate research and development (R&D) for the next generation of AI models, a trend that could lead to runaway technological progress. Policymakers and the public should be paying close attention to AI R&D automation to prepare for how AI could transform the future.

Under intensifying competition to improve the capabilities of new AI products, companies that can accelerate AI R&D have the best shot at capturing the growing AI market. Last month, OpenAI released a preview of o1, an AI model that achieved a significant advancement in reasoning. Notably, OpenAI indicated that o1 models can ace the coding interview the company gives to prospective research engineers, the people responsible for designing and implementing AI itself.

Policymakers and the public should be paying close attention to AI R&D automation to prepare for how AI could transform the future.

These capabilities build on years of technology companies using AI to accelerate software development. Tools like GitHub Copilot complete lines of code as developers type. Upgraded models like ChatGPT enable developers to converse with these tools to iterate on designs, troubleshoot errors, and understand technical concepts.

Companies have since observed widespread adoption and significant productivity benefits (PDF). In fact, AI tools now generate over 25 percent of the code at Google, while at Amazon they have saved “the equivalent of 4,500 developer-years of work” and “an estimated $260 million in annualized efficiency gains.” With these results, the industry is eager to test AI in larger roles.

Moving forward, technology companies want to use AI not just as tools, but instead as autonomous software developers. AI agents, systems that can take action to execute complex tasks with minimal human oversight, could enable this prospect. An early example is Amazon Developer Q Agent which, when given access to a codebase and general instructions for a new feature, can read the relevant files, design a solution, and make the necessary edits.

In their present state, agents are limited by their ability to reason through problems involving many steps and pieces of information, requiring substantial guidance from humans. However, the industry is determined to overcome these barriers, pouring billions of dollars into projects like o1, as well as startups like Cognition and Magic that aim to automate software development. As evidenced by rising scores on benchmarks of real-world programming skills like MLE-bench, these efforts are making progress.

Advancing agents are particularly promising for use in AI R&D. In the industry, AI progress is driven mostly by experiments rather than theory. To test new ideas, companies must build numerous prototypes and pipelines, ensuring they are compatible and efficient. Many of these tasks are easy to specify and their objectives easy to measure, meaning companies could soon delegate these tasks to agents.

As agents become more reliable, they could take on increasingly important roles not only in implementing experiments, but also in generating hypotheses and analyzing results like AI does in the physical sciences. These experiments identify methods that can deliver improved capabilities for future products. While the present impacts are minor, such automation could eventually produce a major compounding effect in which each generation of AI systems enables companies to reach the next generation faster.

Tracking R&D automation from the outside will be challenging as companies will want to guard proprietary knowledge just like OpenAI has with o1. While this trend could bring observable opportunities such as accelerated AI capabilities, economic growth, and scientific discovery, it could also bring obscured challenges that are just as consequential.

By delegating R&D to agents, technology companies could lose insight into how their AI systems function. As a result, they might need to implement new practices for aligning AI systems with human values and monitoring societal risks. Companies like Anthropic, OpenAI, and Google have identified risks from R&D automation in their safety frameworks, committing to measure and plan for their emergence. To evaluate these efforts, governments might need to establish oversight for not only the external deployment and use of AI, but also the internal use of AI in R&D.

Moreover, R&D automation could produce highly powerful AI systems much earlier than many expect. If each advancement in AI facilitates the next, decades of progress could happen in years. Such acceleration could outpace efforts to build the technical capability, state capacity, and international coordination necessary for humans to maintain control (PDF) over AI systems.

R&D automation could produce highly powerful AI systems much earlier than many expect.

The emerging race dynamics in the AI industry and on the world stage could incentivize participants to hasten R&D automation even as these efforts fall behind. As companies and governments coordinate to define red lines for AI advancement, they may need to consider R&D automation as an important factor in their risk thresholds and prepared responses.

There is great uncertainty and disagreement over how fast AI can be expected to advance and how hard it will be to control these systems. R&D automation demonstrates a path towards accelerating AI progress, enabled by limiting human oversight. As this trend unfolds behind closed doors, the public will need more information to prepare for the futures it could bring.