Skip to content

Margin of Safety #11: Rethinking Workflows in the Age of AI Native

Jimmy Park, Kathryn Shih

April 23, 2025

  • Blog Post

As AI drives down the cost of creation and analysis, the next wave of innovation will come from reimagining entire workflows

Before a little re-imagination vs after

Previously [link], we wrote about the possibility for AI native experiences to generate wholesale new markets. This led to a common question – is there a way to systematize the identification of areas ripe for AI native experiences? While there is no full answer, we believe there are patterns that will lead to novel experiences. In particular, areas where existing workflows are structured around high-cost activities subject to AI automation. Identifying these areas and workflows will allow for rethinking and redesigning them into fully native experiences, akin to how Instagram rethought photo creation, editing, and sharing.

In this post, we’ll dive into identifying these workflows, plus what their AI native alternatives might look like.

Infrequent invocations of expensive processes

One class of disruption ready workflows are those centered around infrequent, high-cost checks or reviews. These don’t just exist in high tech — for example, medical checkups occur only annually because they’re expensive in both time and money — but they all share the property that their high cost has driven their low frequency. If a medical checkup was instant and free, you’d do it much more regularly. But this time and money tradeoff isn’t unique to medicine; we also see it all the time in business and security.

In many business environments, such workflows often manifest as periodic reviews; think of everything from quarterly business reviews to annual compliance checks. In reality, business owners would rather find issues ASAP rather than during the infrequent review cycle, but the high cost of the review process means it’s cost-prohibitive and unrealistic to run it more frequently.

What would an AI native experience look like then? The naïve answer would be to automate the process and run it at a higher frequency – say, weekly rather than quarterly. But is this the native experience? What if you could instead detect whenever a change would cause a problem during the next review? For example, if new process will break a compliance requirement, or if a newly discovered bug will cause a roadmap slip for the quarterly business review. Why wait for a review versus flagging at the time of deviation? Perhaps AI-native automation could alert owners as soon as issues emerge but before they take hold, plus manage notification and communication with broader business units.

One step further would be to proactively detect when an action might influence future results and nudge users at that point in time. To extend on the medical analogy, this could be like a fitness tracker helping you remember to hit personalized daily wellness goals, with an eye towards optimizing your long-term health. As evaluation capabilities become cheaper, we’re looking for creative ways to push evaluation upstream.

Workflows predicated on scarcity

Another class of workflows ripe for reinvention are those built around managing scarcity—particularly the scarcity of high-quality creative or expert output. Take A/B testing: today’s tools help derive insight from a small number of variations—because generating and testing each new version is relatively expensive. But as the cost of generating variations plummets (thanks to generative AI), the bottleneck shifts. Now, the challenge isn’t generating ideas—it’s knowing which ones are worth testing. The most valuable capabilities start to resemble “taste”: the judgment to prune ideas before experimentation even begins.

In many places, “taste” can be hard to model, because it’s often deployed in contexts where there are few data points from which to generalize – or to tune an AI model. For example, many companies can only A/B test so many variations of a website because website traffic is ultimately finite. And with limited traffic, you can only gain statistical confidence about so many comparisons. Many categories that rely on human interaction to ultimately verify (or reject) an idea have this property; in these categories, “taste” can be thought of as predicting esoteric human preference from very sparse data. Research projects and high-level technical designs can have similar gotchas. If you’re setting out to build a new hyper-scaler, there are very few examples of successful architectures to build or train on! Taste will be part of your engineering design.

We expect versions of taste or judgement to remain in cybersecurity as well. For example, SOC automation is a hot category. And there are SOC tasks for which there are reams of training data to tune AI models to perfection, such as initial alert triage. But there are also tasks for which training data is incredibly scant, both because events occur infrequently and because the subsequent data is kept deeply confidential. For example, how do you respond when you discover a true positive alert? This workflow has abundant scarcity – finite defenders, short timeframes, and regulated reporting requirements. It also has virtually no outcome data with which to bring direct AI automation online. Could AI native experiences involve using AI to rapidly flesh out quarantine and response plans and allowing human defenders to exercise their judgement in selecting which strategic plan to deploy?

Yet today’s agentic tools largely stop at generation or lightweight evaluation. They rarely attempt to cooperate with a human operators’ evolving sense of “taste.” Reimagining tooling around this kind of human-AI collaboration—where taste is not an afterthought but a central design principle—remains an open frontier.

Extending the Instagram analogy

Looking at the Instagram use case though this lens, the workflow that it disrupted – carefully capturing digital photos, meticulously editing them on a desktop computer, and sharing a noteworthy result – was ripe for reinvention. The workflow was slow and (relatively) infrequently performed due to its overall high effort, yet the capabilities of a new device were aligning to enable a dramatically easier alternate flow.

And to some extent, the influence economy that Instagram and other social media networks spawned is an example of taste-centric revamp. As content became far cheaper to generate, winners were ones who figured out how to systemize their sense of taste – look at Mr. Beast’s extensive content playbook for an example – often using metrics and tooling to tune their judgement.

Continuing that analogy, perhaps we should all be looking at complex, multi-step workflows in which multiple individual steps are now within range of automation. Rather than automate each step, what happens if the entire workflow is re-envisioned, but with AI relaxing constraints around cost and effort?

Stay tuned for more insights on securing agentic systems. If you’re a startup building in this space, we would love to meet you. You can reach us directly at: kshih@forgepointcap.com and jpark@forgepointcap.com.

 

This blog is also published on Margin of Safety, Jimmy and Kathryn’s substack, as they research the practical sides of security + AI so you don’t have to.