Skip to content

Margin of Safety #5: Agentic Systems and Learning

Jimmy Park, Kathryn Shih

February 25, 2025

  • Blog Post

More agentic learning might not necessarily be better

If you listen to the marketing, one key benefit of agentic systems will be that they can continuously learn. Let’s unpack that. First, how do systems learn? And second, when do you want continuous learning? As a corollary, when might you find continuous learning undesirable?

 

Learning

An agentic system will be a combination of elements:

  • LLMs
  • Code, including tools (or code to execute other models) the agent may invoke
    • Other neural networks or AI models
  • Data sources, such as:
    • Saved state(past interactions)
    • Environmental data
    • Output from RAG systems
    • Tool execution results

Ignoring any imprecision-driven variation in the neural network computations[1] or explicit random number generation (which might be desirable – for example, people often prefer talking to chatbots that possess some variation in their response patterns), this is a deterministic system. The response of the agent to a given input or data will not change unless the underlying models or code are updated.

Sometimes, an agent adapts naturally based on its actions. For example, if it modifies a filesystem and sees the change didn’t work, it might try a different approach. But adapting in the moment is not the same as learning for the future. True learning requires an agent to:

  1. Record past interactions
  2. Record the results it observed
  3. Reflect upon trends in those interactions
  4. Have the capability to recognize a trend in which the tool isn’t working.
  5. Store that recognition in a persistent location for future use (or re-derive it every time it would potentially use the tool)

These steps don’t happen by default. They’re possible, but a raw LLM won’t do them unless it’s surrounded by additional engineering infrastructure. In practice, most engineers would probably design things so that agents learn about tools in slightly more generalizable ways – for example, a system might simply record all tool use and the observed result. But even in the generalized form, specific software implementation steps are required for the agent to be able to learn.

What does it mean if agents don’t learn by default?

At a high level, this is an opportunity for creators to be thoughtful about learning. For example, you may want learning to be siloed in different ways on a per use case basis.

Let’s say you’re making an agent that will automate SOC investigations – a popular domain!

  • Cross customer learnings: “These logs indicate log4j being exploited — emergency situation”
  • Customer-specific learnings: “Company [X] only uses Microsoft—Mac devices are suspicious.”
  • User-specific learnings: “Alice prefers to review data in GMT, not local time.” [2]

But when you design or review a system, you need to be thoughtful about the level of scope for different types of learning: this will ensure that the infrastructure scopes data access in ways that prevent inappropriate leakage. For example, it’s likely best if the instance of the agent that Alice interacts with has no ability to learn specific things about Bob, because then there’s no risk of data leaking.

What should we be thinking about when we implement learning?

Once we recognize learning can be controlled, we can think about when it’s desirable – or not. At its heart, learning represents a modification to behavior. Classic engineering practices say behavioral modifications should be tested before production use, and the bigger the modification the more strenuous the testing. We don’t necessarily think about learning in this context, because we tend to assume learning is always good. But especially with ML, that’s not always the case.

Another important nuance is how a system handles mistakes—often referred to as the “fail-fast vs. fail-silent” dynamic. A fail-fast system loudly breaks or signals an error at the first sign of trouble, making it easier to catch and correct issues before they cause widespread damage. By contrast, a fail-silent system may continue operating in a flawed state, quietly drifting away from the intended behavior. This can be especially risky for AI-driven agents, because unintended learning can accumulate over time and remain hidden until they trigger a major incident. Incorporating clear error-handling and monitoring mechanisms is therefore critical: if the system veers off-course, you want it to do so transparently rather than silently.

Unsupervised, a system might learn the digital equivalent of bad habits, where it develops strategies that achieve a technical target though unexpected means. For example, a chatbot might adopt strategies in which it hangs up on unsatisfied users before they can post a negative rating, or it might learn that the best way to optimize a user satisfaction score is to give massive discounts. In both cases, the bot is successfully identifying a new strategy to maximize its target metric, but it’s probably not a strategy you want to see deployed to production.

So what does it mean if behavioral change isn’t necessarily positive? At a high level, it probably points to classic rollout methods – staged rollouts, percentage test, etc. – being useful. The more sensitive a workload and the more unbounded the potential behavioral change, the more you may want to control and monitor the update. There may also be other techniques you can use. For example, per-user learnings might be less risky if you restrict the learnings to user-visible, user-controlled settings.

The power of agentic systems doesn’t come from learning more, but from learning better. When designed thoughtfully, they can become smarter and more effective while staying reliable and aligned with human intent. Learning should be deliberate, tested, and monitored—because just like in human education, not all lessons are worth keeping.


[1] In practice, large neural networks, including LLMs, often have slightly unpredictable behavior – this is because they need so many mathematical operations that tiny amounts of silicon-level error in floating point math operations can compound into larger variances that materially affect output. While this behavior is important to acknowledge, it doesn’t really affect this topic. So we’ll make a simplifying assumption and ignore it (at least for now; let us know if you want more on this later).

[2] It might not be a coincidence that per user learnings can look a lot like settings – UX best practice is to give users control over these kinds of things, and settings are the classic way to do it.

Stay tuned for more insights on securing agentic systems. If you’re a startup building in this space, we would love to meet you. You can reach us directly at: kshih@forgepointcap.com and jpark@forgepointcap.com.

 

This blog is also published on Margin of Safety, Jimmy and Kathryn’s substack, as they research the practical sides of security + AI so you don’t have to.