Skip to content

Margin of Safety #26: The MIT report on AI failing, from people who actually read it

Jimmy Park, Kathryn Shih

August 27, 2025

  • Blog Post

Even if a company’s official AI initiatives are failing, most employees are successfully using shadow AI for their own tasks

Unlike some fraction of folks who are blogging about last week’s MIT report, we read the entire thing with 0% ChatGPT summarization assist. A bunch of it has already been discussed – 95% of companies are seeing 0x ROI – and we won’t repeat that. But there are some hidden gems in the report that we think merit a bit closer look. With that in mind, let’s have a lightening round of the less reported details!

(Source: MIT)

Some of our takeaways from the MIT report

– Shadow AI is huge

  •  Even if you think your organization is failing at AI, your employees are probably succeeding. Workers from over 90% of surveyed companies are quietly using AI to assist their jobs, often on consumer subscriptions and without IT knowledge.
  • Key quote: “In fact, almost every single person used an LLM in some form for their work.”
  •  We think this is a very overlooked portion of the report. Users are successfully identifying use cases that meet all the criteria for AI, while top down efforts overwhelmingly fail. What are users likely doing differently? We expect a couple things:
    • Individual users are far less resourced. This forces their individual ‘pilots’ to stay bite sized; they’re likely much more focused on automating repetitive tasks versus targeting moonshots that may be outside an organization’s capabilities.
    • Users likely try quick tests – think about your own usage of LLMs. If a model is far from capable, you typically don’t invest more resources in that use case. You probably drop it, move on, and experiment with something else. This translates to quickly testing many ideas to find things that are likely low hanging fruit versus investing outsized resources in chasing a pre-selected strategic priority that may or may not be technically feasible at any reasonable cost.
    • Users are good at picking tasks which are so annoying that the user is happy to babysit the AI in exchange for offloading the task. Our classic example of this is our AI note taker (shout out to Circleback [referral link]) – is it perfect? No. Is it way better than either of us having to waste hours taking, reviewing, and editing notes? Heck yes! By filtering for these sorts of tasks, you end up with users cheering the AI on versus quietly resenting it. This translates to patience with imperfection plus a willingness to help improve model quality over time, both key factors in successful AI quality iterations.
    • Users are more able to throw huge models at problems. When we want to dodge an annoying but non-sensitive task, we throw Deep Research at it. Realistically, OpenAI is likely subsidizing the consumer version of this model in order to harvest data. This makes us careful in what we personally use the consumer version for but it also makes it hard for an small enterprise startup to compete, because the quality bar is highly subsidized.
  • What this means for enterprises in general – if you want to succeed at Gen AI, create a safe space in your organization for employees to fess up to the portions of their job that they’re already trying to automate! Bless and sanction their efforts and provide a mechanism for them to share it with coworkers. Let them experiment with finding low hanging fruit via generalist rather than specialist tools.
  •  What this means for security buyers – Early champions will be security analysts who adopt AI in their personal workflows. For example, SOC analyst -> SOC team -> enterprise security org, or build highly vertical focused security product that wins in workflow-specific AI, not broad “AI for security”. First win by embedding in narrow, painful, repetitive workflows first, and then expand to an adjacent team.
  •  What this means for security solutions (and legal departments) – You better be thinking about Shadow AI, because MIT thinks it’s in place at basically every customer you could sell to or enterprise you could work at. The consumer services employees are using probably have the right to retain data for training.

Failures are framed as stemming from AI’s failure to live up to user expectations around learning.

  • This might be a topic for a future blog, but if there’s one thing humans trounce most models at, it’s fast learning (well, and having common sense: two things). AlphaGo may be able to wipe the floor with Lee Sedol, but it played orders of magnitude more games than he’s ever seen. As far as we know, there’s no Go AI that can come anywhere close to his level if you control for number of games played. And this doesn’t just apply to Go; (smart) people and even some old dogs learn new tricks much faster than most AI systems.
  • We wonder if the solution here is really going to be AI systems learning faster. This frankly has not been the way AI has succeeded to date; instead, successes to date have typically come from finding areas where either (A) data volumes compensated for low learning rates or (B) system guardrails could allow for slower learning [eg, keeping Waymo in fixed geographies and off highways] or (C) picking areas where learning mattered less [like good-enough versions of tasks users don’t love]

Reach out to us if you are building in the space of AI & security. We have some thoughts!
Kathryn Shih – kshih@forgepointcap.com
Jimmy Park – jpark@forgepointcap.com

This blog is also published on Margin of Safety, Jimmy and Kathryn’s Substack, as they research the practical sides of security + AI so you don’t have to.