DeepSeek struck a nerve
After rocking the world with DeepSeek, China faces potential new import/export restrictions.
DeepSeek is a company that produces a LLM family of the same name. It made waves in January by releasing DeepSeek-R1, a reasoning model that had comparable benchmark performance to OpenAI’s o1. It built on techniques that they had previously published in previous models, but this model caused a splash because it was competitive with OpenAI’s reasoning model, was released under the MIT license with open-weights1, and was made by a Chinese company that wasn’t supposed to be able to do this.
LLMs have two relevant phases of CPU usage: Training, where the model is created, and Inference, where the model is used to answer questions. To date, training has been very expensive and inference has been very cheap. Cheap inference is a tradeoff, since it means that the LLM’s thinking is limited by the number of tokens it can output. Historically, LLMs are better problem solvers when they are encouraged to decompose their reasoning into individual steps. This is called “chain of thought.” Reasoning models take this a step further, and use reinforcement learning to teach the model how to create its own logically-consistent chains of thought. Unlike traditional LLMs, reasoning models aren’t limited to a fixed amount of compute. In fact, the quality of answers scales with how much compute they are allowed to consume while thinking.
Why wasn’t DeepSeek supposed to be able to accomplish this? DeepSeek is based in China, which faces export restrictions of GPUs. To meet the restrictions, Nvidia manufactured GPUs where the inter-chip data transfer rates are a fraction of what their top-of-the-line H100 can achieve. AI training is naturally a large-scale distributed workload, and training runs take months even with unthrottled chips. So this is an effective way to stop China from using American-made GPUs to train their models, effectively removing them from the competition.
However, DeepSeek has been working on using their own collection of lesser GPUs to their maximum. They use distillation to capture the knowledge of larger models into smaller ones. They use advanced mixture-of-expert techniques to only require a portion of their model to be loaded into memory at a time. Hell, they actually coded their own custom logic for the GPUs that allow them to use chips that would normally be dedicated to communication for computation instead. They truly pushed the GPUs to their limit.
At the time of writing, 1 million tokens costs $2.19 on DeepSeek2 and $60 on OpenAI. But remember: it has open weights, so anyone can run it. In fact, Microsoft Azure is hosting it right now. So are others.
DeepSeek made a huge splash. DeepSeek is, at the time of writing this, the #1 app on the iOS app store. The US Government put some effort into stopping China from being able to do this. Surely they realized that they were beaten fair and square and that was the end of it?
On an unrelated note, Josh Hawley introduced a bill that makes it a crime for AI technology to be imported or exported, which would attempt to force China and the US to develop their own AI in full isolation. This would likely prevent AMD and Nvidia from selling GPU technology to China in any form, and might even be a de facto DeepSeek ban in the US because it adds import restrictions on AI technology also. Independent of this, the Trump administration is considering tightening restrictions on the export of chips to China.
Trump officials discussing tightening curbs on Nvidia's China sales, sources say
Conversations to restrict shipments of those chips to China are in very early stages among Trump officials, the people said, adding the idea has been under consideration since Democratic former President Joe Biden's administration. H20 chips can be used to run AI software and were designed to comply with existing U.S. curbs on shipments to China, spearheaded by Biden.
Not everyone is feeling glum about this. Andreesen-Horowitz wrote a post comparing DeepSeek’s impact to the Sputnik race, as a followup to Marc Andreessen’s Tweet saying the same thing. It’s worth reflecting on the fact that the space race spurred the US Government into spending tens of billions of dollars to run the Apollo missions. Coincidentally, a venture capital firm would be the exact kind of company that would benefit from a massive influx of government spending and decreased regulation.
It’s forcing OpenAI to consider what can be open-sourced, and also to reveal more of its chain of thought. Sam Altman also admits that they will have less of an advantage over their competitors than they have previously. OpenAI also made its reasoning models available to free users for the first time.
Meaning that anyone can run it.
We have no way of knowing how subsidized this value is. However, hosting provider Nebius is serving the model at $2.40/million tokens, which strengthens the argument that this is a profitable level.