Given the transnational risks posed by AI, the safety of AI systems, wherever they are developed and deployed, is of concern to the United States. Since China develops and deploys some of the world's most advanced AI systems, engagement with this U.S. competitor is especially important.
The U.S. AI Safety Institute (AISI)—a new government body dedicated to promoting the science and technology of AI safety—is pursuing a strategy that includes the creation of a global network of similar institutions to ensure AI safety best practices are “globally adopted to the greatest extent possible.”
As with cooperation with the Soviet Union during the Cold War on permissive action links (PALs), a technology for ensuring control over nuclear weapons, the United States may again wish to keep its competitors safer to assure its own safety. The PALs case also shows how a track record of engagement between subject matter experts can be critical to enabling cooperation later. However, as with PALs, care must be taken to make sure that in helping make Chinese AI safer, the United States does not also help it advance its AI capabilities. For this purpose, the safer bet may be avoiding cooperation on technical matters and focusing instead on topics such as risk management protocols or incident reporting.
Cooperation Later Might Depend on Engagement Now
Emerging AI technologies may pose a variety of risks requiring international cooperation in the coming years, including risks related to proliferation of dangerous biotechnology, geopolitical crises caused by failures of autonomous military systems, or large-scale accidents resulting from AI systems embedded in important parts of the world economy. For future cooperation to be effective, it might be important for the United States to engage Chinese subject matter experts on AI safety and governance now to build relationships, better knowledge of Chinese counterparts, and some degree of trust. This engagement need not constitute in-depth cooperation with ambitious specific goals, whether sharing novel technical information or pursuing collaborative initiatives, but may be useful even if restricted to interaction in multilateral and bilateral meetings, reiterating areas of consensus, or similar. It also may not need to delve deeply into core technical information related to AI capabilities but could instead focus on topics related to governance systems around AI models.
Emerging AI technologies may pose a variety of risks requiring international cooperation in the coming years.
In the case of PALs, the United States shared technology with even its key rival, the Soviet Union, to help reduce the risk of accidental or rogue nuclear launches. Here, a history of prior engagement was an important element making cooperation between the United States and the Soviet Union successful. By contrast, the lack of prior engagement impeded cooperation with China and Pakistan even though circumstances tended to favor sharing. If the same principles hold for AI, some form of engagement now may be important to build the capacities and relationships to cooperate effectively with China on AI safety in the future, if the United States so chooses.
Safety Technology Is Frequently Dual Use
AI safety would seem to present an opportunity for win-win engagement, as PALs did for the United States and the Soviet Union in years past. Inconveniently, however, many technologies developed and used to make AI systems safer also have the potential to be used to make AI systems more capable, as we will see. Collaboration with China even on AI safety could risk buoying China's position in technological competition with the United States, as well as arming it with capabilities that it might deploy against the United States and its allies some day.
Some safety technologies are directly dual use, such as reinforcement learning from human feedback (RLHF). RLHF, a technique where AI systems are trained to generate outputs that get better ratings from humans, can just as well be used to encourage systems to adhere to values Americans would support, such as freedom and equality, as those they would not, such as Chinese Communist Party ideology.
Some others might be repurposed. For instance, an evaluation for measuring an AI system's capability to help conduct offensive cyber operations could also be used to help train those capabilities into an AI system.
And some safety advances may furnish technical insights that could later catalyze capabilities advances. For example, the field of mechanistic interpretability research is motivated largely by the goal of enabling technical assurance of advanced AI, like providing evidence that a system is sufficiently safe for its intended use. However, discoveries in mechanistic interpretability have in a few cases also inspired architectural innovations, like changes to the basic building blocks from which AI systems are constructed, to create more efficient systems. These systems are no safer, but are more accessible for developers with fewer resources, such as those in China subject to export controls on advanced hardware.
Sharing the technology of “seatbelts and brakes” could also help Chinese organizations “drive faster and more dangerously,” to extend the analogy. In the context of PALs, a similar concern was viewed by U.S. policymakers as a reason not to share the technology with Pakistan: they feared the greater control over launch would encourage Pakistan to maintain their nuclear weapons at a higher readiness level, leading to a net increase in risk. More robust, trustworthy, controllable AI would be more likely to be applied to safety-critical, strategically-relevant domains across civilian and military applications. This could increase risks to U.S. interests in multiple ways: by making China a more formidable competitor, by putting marginally safer but still unreliable AI systems into military applications where they might spark a crisis, or by setting the scene for catastrophic accidents which could overflow China's borders.
A potential counterargument is that, sooner or later, China's highly skilled AI researchers are likely to discover most of the same technical secrets American researchers do. Hoarding insights may simply delay their discovery. While there is truth in this counterargument, a technical lead of even just months could still potentially be an important advantage, especially if AI progress at leading labs at some point becomes significantly faster.
Nontechnical Cooperation Carries Lower Risk
By contrast, engagement on nontechnical topics is less likely to risk facilitating Chinese AI development and deployment. Some examples of these topics include risk management frameworks (such as the U.S. National Institute of Standards and Technology's AI Risk Management Framework (PDF) and the AI Risk Management System from China's Artificial Intelligence Industry Alliance, as well as “if-then commitments”), defining so-called “red lines” demarcating thresholds where a system's capability or propensity to engage in dangerous behavior merits higher concern, or lessons learned related to establishing and operating a safety institute like the U.S. AISI. Although engagement on some such topics could have the same seatbelt-and-brakes effect as described above, nontechnical cooperation would generally at least avoid potentially leaking the most sensitive technical insights and details.
Perhaps even more so than on technical safety, where the U.S. ecosystem quite clearly leads over the Chinese, the United States could benefit from China's experience with nontechnical safety measures.
Perhaps even more so than on technical safety, where the U.S. ecosystem quite clearly leads over the Chinese, the United States could benefit from China's experience with nontechnical safety measures. As one example, all three of China's existing regulations on AI systems require some form of incident reporting. Incident reporting is a risk management tool which has also been discussed in the U.S. policy community, but the best implementation for AI governance is not obvious. Lessons learned from China's practical experience with reporting schemes could potentially help inform the design of such systems in the United States.
For American policymakers who may seek to engage with Chinese counterparts, the lack of an official, national Chinese AI safety entity, along the lines of the U.S. AISI, need not be seen as a challenge. A number of institutions in China are already doing similar work and in some cases, their key staff have even expressed concerns about shared planetary-scale risks and support for international cooperation to address them. With careful choice of subject matter, engagement with these organizations and others could be a prudent step to promote American interests and global security.