Margin of Safety #23: First Mover Advantages in AI Agentic Development
Jimmy Park, Kathryn Shih
July 29, 2025
- Blog Post

Does Agentic Development Have First Mover Advantages, Disadvantages, or Both?
As the world seemingly fills up with highly valued agentic startups, we think an interesting question is where first mover advantages versus disadvantages will dominate. We can see plenty examples of both – at a high level, first movers can start building valuable data moats and will develop early AI expertise and both intuition and practical experience with the next wave of AI automation. But in exchange, they’ll pay much greater per-token costs, will likely need to spend additional time and resources developing strategies to address greater model error rates, and will be fighting over less-common talents in their own recruiting. In this post, we’ll break down those areas a bit and look at where we expect durable versus non-durable results, with an eye towards thinking about where first movers (and their backers) should be enthusiastic.

Data moats
First, let’s consider data moats.
The whole goal of a data moat should be its relative unassailability, so your need your dataset to be unique, valuable, and hard to replicate. This means the most interesting data needs to be in datasets where high quantities of licensed alternatives are not readily available and where you believe you can capture enough market share that 17 competitors with a similar offering won’t be building up identical datasets: we could call this the “Cursor” strategy.
If your plan is to actually train an AI with your data, you had better plan on having a decently large amount of it. Ideally, you want data that can be generated from usage but is also hard for bulk vendors to replicate. Some categories of security are like that; you can pay Scale or their competitors as much money as you want, but they are going to have a hard time finding environments or contractors who can really simulate actual security incidents.
Finally, a data moat is valuable if and only if it actually lets you make a smarter agent—perhaps by carefully targeting your error handling (see later in this post)—or otherwise converts to a commercial advantage. So first movers with this strategy need to have a clear articulation of how the data will allow them to address a sufficiently major customer need to justify their first mover costs. We’ll call this one a win for the first movers, subject to that caveat.
AI expertise
Building on the idea of data moats, a first mover can also develop valuable AI expertise. Your people get smarter at both using AI tools and also developing AI systems, and this can be a velocity improvement if you can keep the people.
Beyond velocity, we think it’s hard to successfully speculate on what AI is good or bad for without actual experience, so if your business strategy depends on making good bets, you either need help or you need to try some AI projects. AI is a muscle like any other, and projects will only get easier. While this doesn’t guarantee a first mover advantage, you probably don’t want to be the last mover anyway. We think this one nets out to be a tie.
Token costs
We can quibble about the exact rate of token cost decay, but it’s broadly agreed that token costs are decreasing by an order of magnitude on a roughly annual basis. So if your LLM-backed workload is 5x too expensive at inference time to be economically viable right now, waiting 6 months and trying again is a reasonable strategy.
This means that startups with significant inference costs face a painful economic reality; competitors that are one year delayed will be able to replicate some operations for 1/10 the cost.
So what does this mean? The head start probably makes sense if you think there is a long-term technical or go-to-market hill to climb, where in 3 years the outcomes for a 4-year-old company versus a 5-year-old company will be different. But the more you think a space will be commoditized or face diminishing marginal returns on quality, the less value this early start is going to have.
Cynically, we’d point in both this category and the last one that expertise is attached to humans (or LLMs?), not organizations. A real risk for new companies is that they fund the expensive development of employee expertise and intuition, only to see those employees depart before they’re able to reap the rewards of said expertise.
Sadly, this one ends up too conditional to call in the general case – we think the advantage is clearly to later players in commoditized areas, but that early movers able to maintain their velocity will be able to maintain an edge.
Resources spent managing errors
In the near term, error rates are potentially a problem, especially a workflow involves many subtasks with relatively uncorrelated success rates [1]. There are a wide variety of techniques that can be used to detect and address errors, ranging from automated approaches to human eval, but they all have cost. At a high level, our expectation is that investments to address any individual error source may or may not be durable – after all, it’s typically feasible to tune any single error out of a core model – but that generalized investments in quality infrastructure will be durable if and only if there remains space for premium quality offerings.
This leaves us more optimistic about anti-error investments when spaces are longer tail, higher complexity, and/or ill-suited to scaled RL techniques from the biggest model providers – eg, when a space is likely to have medium to long term room for delivering a higher reliability offering than what new entrants can get off the shelf. As a result, we think the victory for this one goes to early movers in complex spaces – where error management will need to be a thing – and the later players in simple spaces.
Conclusion
In conclusion, we like AI in areas where there’s significant work to do outside the model, and that work is unlikely to be baked into the broader API surface of a foundational provider. We also like early companies in areas where the work is on a long enough time horizon to generate a medium-term advantage. That said, security also has space for disruptive fast followers with a great user experience in less differentiated spaces.
[1] When task error rates are completely independent (aka uncorrelated), the error rates compound exponentially and a 2% failure rate on one task becomes a ~18% (1-.98^10) chance of failure over a 10-step task workflow. But in practice, most workflows involve tasks that revolve around a small number of central themes and capabilities, meaning that the per-task error rates are not fully independent and the compounding math is far less brutal.
Reach out to us if you are building in this space!
Kathryn – kshih@forgepointcap.com
Jimmy – jpark@forgepointecap.com
This blog is also published on Margin of Safety, Jimmy and Kathryn’s Substack, as they research the practical sides of security + AI so you don’t have to.