The Chinese firm DeepSeek's R1 model entered a crowded field of large language models (LLMs) that had been dominated by American firms including Anthropic, Google, Meta, and OpenAI.
R1 quickly became one of the top AI models when it was released a couple weeks ago. R1 was also cheaper to develop and run than key competitors. It's currently the top app in the Apple App Store ahead of OpenAI's app. Fundamentally, this represents a failure of U.S. policy efforts to contain China's AI progress, including efforts to limit China's progress by restricting chip sales. However, there are some lessons that can be learned from this experience that may help to shape U.S. policy and advance U.S. interests in AI competition with China.
R1 was cheaper to develop and run than key competitors. It’s currently the top app in the Apple App Store ahead of OpenAI’s app. Fundamentally, this represents a failure of U.S. policy efforts to contain China’s AI progress.
The paper describing the model and its development details three big types of advancements.
- Whereas most LLMs use supervised fine tuning to improve performance, DeepSeek employed an additional reinforcement learning (RL) process. RL is a widely used type of machine learning where models learn to do tasks by trying different approaches and then getting a reward based on the outcome. Overtime, the model will find strategies that are effective at the task. The DeepSeek team developed a novel reward function and strategy generation approach that resulted in faster learning. This step reduced the cost to train the model.
- The next big advance was the use of a technique called distillation to take a large model and then distill it into only the most important parts in a way that substantially reduces the overall size and therefore cost of running it.
- Their final advancement involved a new “reasoning” approach similar to OpenAI's o1 model.
While they were not the first model to use any of these techniques, the DeepSeek team has novel implementations for each. Further, they provided enough detail in their working paper that other researchers and developers can fold these techniques into their own work, which demonstrates the benefit for all of conducting work in the open.
There are several implications for U.S. policy makers:
- While DeepSeek is not exactly a new competitor, their achievement demonstrates that the barrier to entry is low enough that new entrants can be competitive. Therefore, it is too soon to pick winners and losers.
- DeepSeek delivered R1 with open weights, as opposed to the closed-weight models released by most U.S. firms. Open-weight models shift the cost of compute for inference costs from the model developer to the model host. This reduces the pricing power of closed-weight model providers.
- The shift to reasoning models moves computational costs from training to inference, at least relatively. Thus, open-weight models like R1 can be developed in China but the inference need not run in China. This approach could increase Chinese firms' access to market share since the compute for inference could be provided elsewhere.
- The development of AI agents will only push more computation from training to inference and further diminish the share of computational time spent in the training phase.
- Firms that scale their models before optimizing their implementations are at a significant disadvantage because they will have higher training and inference costs.
U.S. policies that constrain China's access to chips for training pushed Chinese firms to focus on optimizing performance in ways that resulted in lower training costs for models and also cheaper inference.
U.S. policies that constrain China's access to chips for training pushed Chinese firms to focus on optimizing performance in ways that resulted in lower training costs for models and also cheaper inference. Similarly, the U.S. policy focus on chips that are optimized for training makes sense in a world where most of the computing costs go into training ever larger models, but as the field moves to more computational time spent in inference, the current constraints don't quite hit the mark.
While current U.S. policies have not succeeded in stopping Chinese AI firms from being peer competitors with American firms, the competition for AI is far from over. In the words of Winston Churchill, “…this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.”