Skip to content

Margin of Safety #33: Learning is Hard

Jimmy Park, Kathryn Shih

October 22, 2025

  • Blog Post

Current AI models cannot safely “learn” in real-time and until they can, learning is more a liability than a leap forward

Readers of this blog don’t need us to tell them that the tech sector is currently keenly focused on advancing from static LLMs to Agentic AI. We’ve previously complained about the amorphous definition of Agentic AI and won’t do so again, but putting aside the definition of Agentic, we’ve noticed people increasingly talk about continual learning as a component of high value agents. Such capability would let agents adapt to new information and environments in real time, without requiring expensive, asynchronous model training passes. However, we believe that anyone who is not a PhD researcher with a very smart plan should temper their optimism around self-improving. The technical requirements for safe, sustained evolution in production are considerable, with key challenges effectively unsolved. In this blog post, we’ll dive in more to the issues we see for this capability.

The enthusiasm for continual learning steps from a desire to fix a core gotcha with today’s generative AI systems. Current LLM-based systems are inherently brittle because their core knowledge is fixed on the moment of their last training run, and their working capabilities are defined by that knowledge plus the information available to them via retrieval at runtime [1]. If regulations change, a key product is retired, or a competitor launches a new feature, a deployed LLM (or an agent based on a static LLM and equally static knowledge base) at a minimum needs an update to its knowledge base (often supplied via RAG) and may require an update to the core LLM’s training to deliver new concepts. If these updates don’t happen, the agent can deliver outdated and incorrect behaviors. Perhaps worse, human users can become intensely frustrated when they interact with systems that don’t learn, particularly ones that are otherwise easy to anthropomorphize. We’re told to treat AI like an eager intern, but a key feature of eager interns is their ability to learn and improve over time. An intern that doesn’t learn can feel more like a bad cartoon than a genuine helpful.

An AI system capable of genuine continual learning would address all these issues. Agentic systems could be rapidly updated to keep touch with business process, and human frustration would be adverted when feedback resulted in clear, correct change. People who see this as a key component for some major use cases are likely correct.

Technical Hurdles: The Reality of Rapid Learning

The technical hurdles, however, are where the vision collides with the unfortunate reality of current AI techniques. First, talk of continuous learning often implicitly assumes that models can (or will soon be able to) rapidly learn from one-off interactions or a handful of examples. That’s far from the current truth. AI today does not learn to generate flawless English or trounce humans at Go because it learns faster. It wins by being able to learn over huge datasets and tremendous numbers of examples. AlphaGo probably played 10,000 times more games of Go during its learning and training process than Lee Sedol ever practiced, and a competent human teenager could likely have trounced an AlphaGo that was restricted to the same number of games Lee has fit in his lifetime. More generally, competent humans still dramatically outperform most neural networks (and basically all LLMs) when it comes to intelligently generalizing learnings from small numbers of examples. This behavior holds across disciplines.

To some extent, this is moot when it comes to agents. In practice, attempts to rapidly update model behavior often rely not on changes to the core training corpus, but rather on fine-tuning or RAG changes. We don’t think either of these will be fully sufficient. Fine-tuning is more akin to rote memorization in humans – it doesn’t typically translate into generalizable skills or knowledge, the same way that cramming for a test doesn’t result in mastery of the materials. And RAG is both brittle and similarly unable to produce novel, generalizable capabilities. As a result, neither is a reliable path to incorporating new concepts or high-level skills, though they can be a way to deliver new knowledge for tactical contexts (eg, a customer support bot correctly answering questions about refund policy). But at a high level, rapidly delivering real, generalizable knowledge into an existing model remains a research problem, not a solved feature.

However, even if these problems were solved, there’d be another, more subtle one: truth. To learn safely, an agent must be able to identify which information should be “learned.” Going back to the intern example, this is a skill that many humans innately possess, at least to some degree. A good intern will recognize that feedback from managers or highly experienced peers is likely more valuable. But the intern is also capable of questioning guidance and investigating (or discarding) feedback they deem as suspicious or inappropriate. For example, very few interns would naively obey a customer or vendor, even a senior one, telling them it’s appropriate for interns to share proprietary business information. And people are broadly aware of, and take calibrated steps to guard against, the possibility of deception – ranging from questionable news to disingenuous coworkers.

LLMs, unfortunately, have no built-in concept of veracity. All information in their training dataset or their context window is more or less equal [2]. This creates a major risk: a self-improving agent might learn falsehoods with the same confidence as facts. Internal heuristics like novelty detection or surprise scoring are, in our minds, insufficiently mature to solve for this. We suspect the answer might be true reasoning, in which the arguments for and against the veracity of a given piece of feedback are carefully collected and evaluated. But such reasoning would stress even current frontier models, and it certainly wouldn’t be cheap if applied to every piece of data an agent is supposed to be learning!

A Security Mindset: Acknowledging Insufficient Mitigations

For practitioners, this is where the conversation must shift from development to security. Continuous learning is currently more fiction than science. But if and when we start to see it commercialized, we think that buyers and users must consider the checks and balances on the learning systems. Options include on human-in-the-loop systems, rigorous validation pipelines, and data provenance tracking. In the more immediate term, we think practitioners should be cautious about promises of continuous learning, especially in domains where (A) learning needs to cover generalizable concepts or (B) it’s unclear how you could strongly prevent the wrong lessons from being learned.

But all that said, one person’s challenge is another person’s opportunity. If you believe you have the technology to put a dent in this problem, we’d love to talk to you.

Reach out to us if you are building in the space of AI & security. We have some thoughts!
Kathryn Shih – kshih@forgepointcap.com
Jimmy Park – jpark@forgepointcap.com

This blog is also published on Margin of Safety, Jimmy and Kathryn’s Substack, as they research the practical sides of security + AI so you don’t have to.

[1] RAG, but not always via vector lookup – we would define this to include other information fetched to augment generation, including via arbitrary tools.

[2] Technically, some of the information may be more or less statistically surprising to the model… but this isn’t remotely the same as suspicious to a human!