Margin of Safety #57: Rethinking the ‘Flight to Safety’ post Mythos
Jimmy Park. Kathryn Shih
May 27, 2026
- Blog Post
Why the Security Wave May Reward the Vendors You Were Rooting Against
The year to date has been full of theories about what capable code-generation models mean for incumbent software vendors. We previously wrote about the so-called ‘SaaSpocalypse’ — the theory that legacy SaaS would be replaced with do-it-yourself implementations — and others have questioned whether enterprises will continue to accept FOSS licenses when AI-powered reimplementation is an option. We think the question of AI-powered code replacement is far from settled, especially in light of Mythos.
Mythos currently tops many CISO priority lists. AI vulnerability discovery has created an arms race around legacy codebases, with maintainers rushing to patch newly revealed vulnerabilities before attackers exploit them using parallel methods. Worse, the same models can reconstruct underlying vulnerabilities from security patches and do so in bulk. This compresses the gap between a patch shipping and adversaries exploiting unpatched systems. Historic timelines that allowed multiple days for critical patch rollouts can no longer be treated as safe.
That puts every software maintainer under pressure, but the nature of the problem differs by codebase.
The high order bits are language and resourcing. Evidence strongly suggests to us that Mythos and comparable models excel at finding memory safety errors in legacy codebases. Such errors are pervasive and serious, account for ~70% of historic vulnerabilities and zero-day.[1] They primarily exist at scale in legacy C/C++ codebases — things like operating systems, browsers, databases, and firmware – which have had decades to accumulate them. Newer, higher-level software is still exposed to the problem, since it runs on top of impacted operating systems, databases, and language runtimes, but doesn’t directly own it. As a result, all software owners are being forced to absorb and deploy upstream patches faster than historical norms allowed, but only the owners of memory-unsafe codebases should expect to be authoring many patches themselves. Unfortunately, for many organizations, the compressed window between patch release and active exploitation is almost as challenging as the core work of patch generation.
This challenge means that results will bifurcate with resourcing and maturity. For directly impacted developers, codebase hardening requires frontier model access, developer cycles, and the capacity to manage regression-free patch development at volume. Currently, this hardening is only available to the select few with Mythos and OpenAI Trusted Access Partner access, and we suspect the labs will maintain that position until key systems are secured.[2] But for all software owners, supporting a rapid patch cycle requires dependency tracking, tested deployment pipelines, and organizational discipline.
From the point of view of a buyer, betting on the latter is scary. Organizational discipline is hard to measure in the best of times, and right now cybersecurity is demanding not only discipline but also an unusual degree of adaptability. We believe this is likely to cause a flight to perceived safety among security buyers.
Such a flight would reward established organizations – including major FOSS projects – at the expense of smaller organizations. In theory it should also reward strong security execution, but the industry’s track record on that front is mixed: historically, some large vendors have been able to get away with lackluster at best security programs, even in the fact of well reported customer impact. We think that in practice, buyers differ in their positioning. For many, a reasonable accountable vendor is most of the value. Even if a vulnerability from a Microsoft-tier vendor leads to downstream compromise, it’s a modern case of “nobody ever got fired for buying IBM.”[3] The key feature for these buyers is security posture defensibility as much as efficacy. For the more diligent buyers, we expect an increased emphasis on supply chain diligence and rapid patching capabilities.
This is what undermines the SaaSpocalypse forces. Displacing an incumbent with AI-generated, first party code assumes that producing the software is the major organizational cost. In reality, the subsequent hardening and patching may be just as hard, and full vibe coding is unlikely to be coupled with the level of operational discipline required to maintain aggressive patch acceptance and deployment pipelines. This means anyone replacing Salesforce with a DIY solution now has a choice: operationalize a patch and rollout process to secure the customer data inside, or personally own the security consequences. Confronted with these options, we suspect some buyers will rediscover accountability benefits of being able to point at a entrenched and, potentially more importantly, external provider.
We also suspect that the accountability halo will extend to major FOSS providers. Linux is as much of a default choice as Microsoft for server workloads, and the core Python ecosystem is the industry wide default choice for many AI/ML applications. Users can defensibly point to them as trusted and, in several cases, preemptively hardened by Anthropic and Project Glasswing.
The biggest open question in this scenario is how long the pressure of the vulnerability wave lasts and, as a result, how many organizations are forced to contend with it on a semi-permanent basis. Our belief is two-sided: the current wave will be the biggest, but organizations should plan on it lasting long enough that they must make structural accommodations. We believe it will be the biggest because the memory-safety issues being flushed out have historically been dominate in both key codebases and historic vulnerabilities. Other classes of security bugs, including those often generated by vibe coding, are either easier to find with pre-deployment checks (eg, SQL injection can typically be discovered with existing SAST techniques) or are almost necessarily less common in codebases since they take the form of higher level logic errors that are necessarily bounded by the complexity of the applications higher-level logic.
On the other hand, we also know that the human-in-the-loop components for critical codebases (triage, code review, etc) are struggling to keep pace with raw discovery.[4] We expect to be stuck in that world for months to few years unless the next generation of models is able to once again produce a step change in capability and relax this chokepoint.
A long but finite wave potentially delays the SaaSpocalypse as large organizations seek accountable ownership of vulnerability exposure before they tolerate DIY. DIY may return, but it will do so enabled by hardened underlying infrastructure and tooling designed to protect vibe coders from the operational challenges that patching is currently encountering. On the other hand, if future models produce equally consequential waves, we believe that persistent changes in patching requirements will shift the security dynamic to favor better-resourced and more centralized projects — with the irony that the same AI capability gains driving the SaaSpocalypse argument are also the primary driver of the expanding vulnerability problem.
If you’re building in this space, we’d like to hear from you.
Feel free to reach out to jpark@forgepointcap.com and kshih@forgepointcap.com.
This blog is also published on Margin of Safety, Jimmy and Kathryn’s Substack, as they research the practical sides of security + AI so you don’t have to.
[1] Microsoft Security Response Center, “We Need a Safer Systems Programming Language,” July 2019 and Google Project Zero, “The More You Know, The More You Know You Don’t Know,” April 2022. Of 58 in-the-wild zero-days tracked in 2021, 39 (67%) were memory-corruption vulnerabilities. Google’s 2024 memory-safety whitepaper cites 68% across Project Zero’s broader dataset.
[2] The biggest question here is how fast DeekSeek and the like can catch up. In previous generations they were able to gain distillation access to frontier models. Providers have tightened restrictions, especially for the Mythos-class, but the Chinese labs are presumably highly motivated. Given the urgency with which infra owners are trying to harden, we are cautiously optimistic that safeguards will last long enough.
[3] While there has been an increase in regulatory activity in this space since IBM’s heyday, we do not believe there’s evidence of it (yet) shifting buyer behavior. We’re now several years out from the 8-k reporting requirement, so we don’t expect further regulation-driven shifts without more regulation.
[4] https://red.anthropic.com/2026/cvd/