Skip to content

Margin of Safety #37: Asymmetric Impact of AI on Cyber Offense and Defense

Jimmy Park, Kathryn Shih

November 18, 2025

  • Blog Post

Does AI help attackers more or defenders more, and why?

We’ve previously written about specific attacker capabilities that are likely to benefit from Generative AI. But this skipped a bigger question: why are the benefits of GenAI so asymmetrically distributed between attackers and defenders? And why do vendor reports always say that defenders will disproportionately benefit, whereas CISO surveys seem to predict the opposite effect? In this post, we’ll explore the factors that drive these divergences and lay out a framework for predicting areas where AI will shift the balance of power.

We believe that most (all?) of the differences stem from underlying property of modern machine learning – in particular, it is (1) inherently probabilistic; (2) data volume dependent; (3) cost-structure impacting, as it converts variable operational costs into fixed infrastructure investments. Notably, these properties cover most forms of modern ML rather than being specific to generative AI. For example, a sophisticated, ML-based malware classifier may or may not rely on a large language model (or even the transformer architecture), but we would expect all three bullets to apply to it.

So what does this translate to in practice? Low(er) cost but probabilistic capabilities are extremely useful when there’s a very high volume of work, and you need to change the unit economics to affordably complete said work. AI classifiers and prediction systems can also significantly outperform human estimates, particularly in spaces where many factors must be weighed (for example, in fraud detection). These problems of unit economics or difficult classification should sound familiar to folks in cybersecurity – on the defender side, we see things like alert investigation and triage, playing whack-a-mole on policy exceptions and access violations, staying on top of ever-evolving attack surfaces, etc. On the attacker side, all of the world’s potential targets and entry points that could be assessed and attacked aren’t quite infinite.. but they’re certainly large beyond what any group could work through in their human lifetimes.

In all of these cases, AI offers the possibility of capability expansion. For defenders, even if AI triage isn’t quite perfect, you weren’t going to touch all of that backlog anyways! There’s basically nothing to lose in having the lower priority part of your backlog triaged 80% correctly versus having it completely ignored [1]. And in some cases, you may be able to tune things to, for example, throw out the simplest fraction of the backlog(even within high priority items) and allow actual humans to focus on the complex, interesting cases.

Looking at things this way, we start to see a potential framework for thinking about AI advantages. It will favor places where there’s a hard classification/prediction task or a high volume of work and insufficient people available to clear that work in an economically viable manner. So tasks should:

1. Have available high data volumes [2]

2. Exist in contexts where failures are low cost (or at least not catastrophic)

3. Be economically amenable to AI – that is, the fixed cost of rolling out and testing an AI solution must be worthwhile. We think this will be particularly true when the capability will be one for which adaptation is hard *or* where a capability is likely to become table stakes, forcing people to maintain it.

Not all use cases will meet these criteria. Some (for example, predicting which code will have zero day vulnerabilities discovered and exploited) would be valuable if you could build them, but suffer from a lack of public data. Others (for example, a threat actor attempting to fully automate lateral movement ) may be ones where failure is very high cost, since it could unwind a rare and costly intruder foothold. And due to the diversity of devices, attacks on specific pieces of IoT infrastructure may not be economically valuable to automate unless an attacker either believes that (A) the IoT device in question underlies critical, high value infrastructure or (B) can be generalized to an economically impactful swath of devices and infrastructure.

Casually speaking, we think a good starting point for finding places that do meet all of these criteria is looking for places where an attacker or defender gets many bites at the apple. In those environments, you often have a lot of training data from previous activities. It also means that a probabilistic approach can be tolerated, since a 90% success rate over many attempts will result in a high probability of overall success. For attackers, we think this translates to the left side of the MITRE framework; recognizance and initial access are often places where there are many paths to the goal of initial access. On the other hand, as previously mentioned, once a toehold is gained in a high value environment, stealth becomes of utmost importances and attackers risk detection if they engage in spray-and-pray expansion. Looking at the recent Anthropic report on AI-orchestrated cyber espionage campaign (link), we believe that 1) the cyber-attack was not quite as automated as readers are led to believe [3] and 2) the parts that were fully automated were more on the left side of the MITRE framework, such as reconnaissance, data collection, and credential harvesting. Whereas the right side of the MITRE framework, Claude was likely less effective and more human intensive.

For defenders, we think the picture is slightly more complicated. We hope that savvy defenders use AI to get ahead of attackers by proactively improving hardening. In that case, we’d expect many initial compromise strategies to remain arms races between competing investments in attacker automation and defender hardening. However, AI arms races rarely have clear outcomes – we expect to see the balance of power tip back and forth based on technical investment and data availability. Once defenders are triaging alerts, we think that detecting the right-hand side of the MITRE framework will typically favor defenders. Most attacks leave multiple indicators, and the challenge is that overwhelmed defenders aren’t always able to detect them all in a timely fashion. But this structure – multiple indicators that need to be rapidly triaged with at least one detected positive – is very amenable to AI methods. Even better, if defenders can get on top of existing alert queues with AI, we expect that they’ll be able to enable more alerts, including those that are currently infeasibly noisy to manage. The net result would be even more bites at the detection apple, and an ever-increasing ability to detect attackers who have gained initial compromise and are attempting to leverage it into an economically valuable outcome.

Overall, we think AI has the strong potential to favor defenders. There are natural advantages on the right hand side of MITRE and the potential to better simulate adversary behavior and enhance preparations on the left hand side of MITRE.

So why do a large albeit shrinking number of defenders [4] seem to feel that AI is benefiting attackers more than them? We think there’s a few possibilities. One is that bringing AI systems online is often a frustrating experience; tuning ML processes can take time, elbow grease, and more than a little human frustration. This internal frustration is visible, whereas the fact that attackers are potentially grinding their teeth over the way a coding model produced an infinite loop instead of a working exploit is known only to the attacker in question. Another is frankly vendor hype. If you were promised a fully automated level 5 customer service agent and instead received an agent that could handle common requests but needed to hand anything unexpected off to a human, you might feel burnt by the under-delivery rather than fortunate to receive the opex benefits of the actual system.

If you’re a practitioner with opinions about how AI is delivering versus the hype or a creator building out solutions to help defenders win the arms race, we’d love to hear from you. You can contact us at kshih@forgepointcap.com and jpark@forgepointcap.com.

This blog is also published on Margin of Safety, Jimmy and Kathryn’s Substack, as they research the practical sides of security + AI so you don’t have to.

[1] Or triaged after 4 months have passed, which realistically is long enough that the attacker will have run off with your crown jewels before you’re able to deal with the alert they generated..

[2] Even for a classifier, you need enough volume to train and validate it – 3 examples won’t be sufficient for any actual AI/ML.

[3] Eagle-eyed readers of the report (link) will note a few details that cast doubt on the degree of agent autonomy.

All graphics show human oversight for the agent, and the attack is described as terminating when “The highest-privilege accounts were identified, backdoors were created, and data were exfiltrated with minimal human supervision” versus none. Particularly notable to us is that the exact structure of said human supervision is never clarified, and even Anthropic claims that the attack was 80-90% automated. This implies humans still performed up to 20% of the work.

We suspect that Anthropic’s motivations are to market Claude’s power at technical tasks and/or advocate for greater regulatory oversight of foundational models(a preference it has consistently expressed), which would lead the true values to be on the pessimistic end of any provided automation assumption.

[4] For example, Splunk’s annual CISO survey reports this sentiment from roughly half of CISOs https://www.splunk.com/en_us/campaigns/ciso-report.html