Margin of Safety #2: DeepSeek vs OpenAI

Jimmy Park, Kathryn Shih

February 4, 2025

Blog Post

If high-performance AI models can be copied at low cost, how do companies like OpenAI survive?

Unless you’ve been offline and under a rock for the last two weeks, you’ve probably heard (or read) about DeepSeek-r1, a new highly performant, ultra-efficient open source LLM released by its namesake Chinese AI research firm. DeepSeek-r1 offers extremely competitive performance on valuable task types (especially mathematical and programing reasoning) at a fraction of the inference cost of existing models and at a reportedly very low training cost (NB: this low cost has massive caveats, excellently covered in this article from semianalysis). But costs aside, anyone not on team rock has probably also seen a pile of discussion around some of the business impacts – from NVDA’s ~$600B drop to questions of Jevon’s paradox and when AGI is coming. But what does DeepSeek mean for OpenAI and other providers of paid models?

A big risk for providers like OpenAI is hiding in the discussion of how DeepSeek achieved their efficiency. At a high level, they’ve cited a mix of (1) determined low level engineering, (2) insights about cheaper model and training architectures, and (3) potentially some model distillation from existing paid offerings like GPT-4o. Before going too far into the what ifs around distillation, it’s worth emphasizing that this not fully proven, even though it’s widely alleged. But if it turns out to be true, there are some interesting strategic implications for closed source model providers.

What’s model distillation and why is it restricted?

It’s useful to start with an explanation of distillation. There are a few different specific techniques that can be used for distillation and Google Cloud’s AI team has a great extended example here. But the basic idea is that to train an LLM (or any neural network, really), you need a way to tell it what you want to do – and, just as importantly, what you don’t what it to do. Imagine teaching a kid to play soccer – you need to give them examples of good outcomes (the ball goes into the opponent’s net), bad outcomes (your own net!), and trouble to avoid (red cards). Neural networks similarly need to have good and bad behaviors clearly and correctly demarcated, or else they might adopt strategies that surprise you in a bad way (picking up the ball and sprinting down the field). Even DeepSeek’s recently touted RL (Reinforcement Learning) approach does this, it just uses programmatic methods to decide what’s good or bad for the model to do.

For many language and reasoning tasks, we don’t have a good way to automatically get examples of good or bad behavior from thin air. The best choices are often to have a human give an example of good or bad sample answers (slow and expensive) or to use an LLM (fast and relatively cheap, but only if you have an LLM to use). If you use an existing LLM, you can train a new model to acquire much of the knowledge from the one that generated the examples. Using one model to generate prompt/response pairs that then guide a new model is one form of distillation, but they all have similar end results: distilling the knowledge from the first model into the new one.

Notably, distillation lets you leverage the cost and expense used on the first model to generate a second model at much lower cost and expense. This is why the Terms of Service (ToS) of providers like OpenAI and Anthropic prohibit using their models to train competing ones; they don’t want competitive products to freeload on their tremendous research and training investment. And they presumably don’t want the market flooded with cheap, high-fidelity clones of their work efforts. This is where one more caveat is necessary: even if distillation occurred, it’s not clear how much of DeepSeek’s quality is dependent on that distillation. That said, distillation is a powerful tool and I wouldn’t be surprised if it made a very significant contribution to model quality.

Is OpenAI Fighting a Losing Battle?

So what does it mean if those ToS restrictions aren’t sufficient to stop the risk? I’d argue this is an existential business risk. Their business plan requires recouping the high fixed cost of model development by monetizing the subsequent model. But how do you monetize effectively in the face of good free clones?

There are a few steps they could try to take, such as:

A. Lobbying for further restrictions on chip access

B. Attempting to detect and block when someone is using their system in a way consistent with distillation.

C. Attempting to legally block commercial usage of offending models in select geographies

D. Attempting to deter commercial uptake of offending models, perhaps by arguing for [or changing their own licensing to create] violations if an unapproved derived model is used in a commercial context

However, all of these approaches have major drawbacks. Dario Amodei of Anthropic has come out arguing for (A) [link]. But to some extent, the toothpaste is out of the tube. To the extent that DeepSeek is deriving a material share of their quality gains by distilling other models, they clearly have enough chips to do so and those chips aren’t going anywhere.

OpenAI has already taken steps towards (B) with o1’s reluctance to fully share its reasoning, and they’ll likely continue to invest here. Some of the distillation reports also suggest that OpenAI detected the usage, but presumably not in time to prevent some level of data collection. And even if the usage was detected, it’s potentially difficult to distinguish distillation from (say) some forms of naive benchmarking. The ToS restriction is around what your intent is in using the API, not in the type of call you make. However, intent is very hard to police, and the most obvious defense (throttling the API or otherwise restricting access down to a subset of trusted use cases or users) would also dramatically reduce its value.

It’s unclear to me if (C) is even a viable option. Blocking the spread of open source is incredibly difficult – who remembers the fight over deCSS? And (D) would be an extreme option, because it could create risk for the providers themselves. LLM training requires such large amounts of data that even with excellent hygiene around data provenance, it’s easy to imagine something like contractor negligence causing disallowed training data to be misattributed to permitted sources, albeit at a very low rate. The sheer volumes of training data make it difficult to guarantee complete perfection in data sourcing and attribution, especially when there’s an element of human trust in the supply chain — this is the same issue we currently see professors facing as they try to detect which students are submitting human authored work. Model providers would have similar challenges making sure that all of their human input is completely human, but that’s what (D) would require.

What does uncontrolled distillation mean for closed source providers?

This leaves two final options that I can see: outrun open source, perhaps in perpetuity or monetize a different point in the stack. Outrunning in perpetuity would require grappling with any eventual stall or slowdown in quality gains and would also require there to continually be new tranches of workloads that significantly benefit from the latest quality gains. The alternative is shifting monetization higher up the stack. This is already happening to some extent. OpenAI’s recent launches, like Operator or Deep Research, suggest a move toward packaged AI agents rather than just selling raw model access. These products offer greater secrecy, better control over how models are used, and the ability to differentiate through clever engineering and excellent toolkits—all of which could help protect OpenAI’s business from commoditization.

The battle lines are drawn: OpenAI (and other paid model providers) must either stay ahead of the open-source tide or redefine what it means to be a premium AI provider. Can it pull it off? What do you think will happen next?

Stay tuned for more insights on AI and cybersecurity. If you’re a startup building in this space, we would love to meet you. You can reach us directly at: kshih@forgepointcap.com and jpark@forgepointcap.com.

This blog is also published on Margin of Safety, Jimmy and Kathryn’s substack, as they research the practical sides of security + AI so you don’t have to.

Learn More and Subscribe