Skip to content

Margin of Safety #47: How White-Collar Workers Can Survive AI

Jimmy Park, Kathryn Shih

March 10, 2026

  • Blog Post

In an AI-saturated world, the winning strategy is to outsource what machines do well while doubling down on distinctly human advantages like judgment, contrarian thinking, taste, and relationships

It seems like the news cycle has moved away from the SaaSpocalypse and back to AI eating jobs. A headline report from Anthropic’s report last week basically leads with their prediction that AI can automate more than half of all white-collar tasks.

*As an aside: Kathryn would like to cast some shade on the Anthropic write up. It has one line in which they suggest they are time-weighting tasks and then it completely fails to describe any caveats around how they’re calculating the time-weighting [1]. Plus, a literal read of their chart labels suggests the contents are as a raw share of tasks, but the write up’s math appendix implies that the entirety of the analysis is time-weighted! Anthropic, please be clear about these things for those of us who are still consuming long form content and not just giving it to a model to summarize : P

But regardless of your take on the report, the question of how to personally resist AI disruption is top of mind for all of us with rent, mortgages, and other real world bills to pay going forward! We have a couple thoughts here. Everyone says to lean in on AI. We think this takes the form of knowing what to outsource while focusing your energies on your comparative advantage.

Diving into the first half of this, we think that on the margin, humans still like to trust other humans. This means that at least half of your competition isn’t a model; its other people who are better at using AI and thus more efficient than you are. And at the end of the day, we’d argue that an employer doesn’t really care how you got your job done; they just want you to get it done efficiently and effectively. So to win in this category, first you need intuition for what’ll work. Different people build intuition in different ways, but we’re personal believers in playing with the tech [2]. If you continue to lob some tasks at a tool every 3-6mo and see what sticks, you’ll at least be informed! We think there are a few good tricks for effective play:

· Start with the age old KISS principle (keep it simple, stupid) – decompose tasks into simple parts and see if each part works. If it does, try to assemble the parts.

· Be highly specific about what you want. The idea prompt is so specific that it makes success or failure unambiguous, but many of us default to prompts that lack this specificity (eg, “write a formal email” – well, how formal?)

· Be prepared to review the results, especially for more complex asks – for example, one of Jimmy’s recent vibe coding efforts resulted in a significant time savings, but also a couple sneaky errors from Claude that would have left us with some major confusion [3] had we blindly trusted its efforts

· And if things fail, play around a little and then realize that even experts are experiencing the occasional debacle [or at least, lost inbox].

· Mind the blast radius: maybe don’t play around with prod credentials or your personal routing numbers without extensive testing and some thoughtful guardrails.

And then when you find something that works, be like a good engineering – be a little lazy! We’re still not advocating Open Claw in environments with any blast radius, but creative mini automations have a lot going for them. In particular, when you find something that works, think about how to make it repeatable and easy.

But that’s the things that AI is good at. What about the things you’re good at? We think there are a few areas to mine here. One is being contrarian. Gen AI can do this when carefully prompted, but there’s a reason that it gets criticized for being sycophantic. Contrarian feedback is not its default. But this is often how you beat the market; by having a smart idea that *isn’t* what everyone else is doing or thinking. The same way it becomes possible to beat the market when everyone’s in index funds, we think it’ll be possible to beat the market by bringing your own opinions that diverge from ChatGPT-standard. And in most domains, you don’t just need to be contrarian – you need to have an internally consistent, slightly contrarian view to develop an internally consistent but differentiated strategery. To that end, we believe it’s worthwhile to interact with popular models and develop a point of view for where you agree with them and, more importantly, where you don’t.

Along those lines, we’re big believers in taste as a differentiator. At a minimum, it’s currently hard to get consistent ‘taste’ out of LLMs without investing in significant infrastructure around the model to enforce consistency across time and items. Especially for infrequent activities, like picking investments (financial, R&D, etc), we think that AI can provide useful inputs, but taste born of experience is difficult to replicate or replace.

Finally, there’s relationships. We’ve brought up the question of trust before, but we continue to believe that the human element of many jobs is under valued. The jobs report may suggest that ~90-95% of computer and math jobs are subject to automation, but we’d argue far more than 10% of the job for many technical leaders is figuring out how to (A) negotiate across stakeholders to agree on a set of technical requirements that are feasible within the constraints of a project and (B) cause those stakeholders to trust that the selected requirements are appropriate and will be delivered on time. Is this outweighed by the number of jobs in which the supermajority of task time is writing test cases and other activities that can be more easily automated? Potentially, but there’s also plenty of room for the scope of communication jobs to increase. What technical user hasn’t dealt with token limits, and what happens if we have to start thinking about whether the cost of those 6 Claude Code instances was well allocated?

Beyond local communication, we think AI may drive an increase in the value of content and community. When everyone’s inboxes are full of automated AI outreach, we’ll need different channels for connection and communication. This is part of our blog motivation, but also opportunity for others. Trust and reputation are likely to keep mattering, and the more you can figure out an authentic way to cultivate it, the more we think you increase your value in a post-AI world.

If you’re building something in this space, feel free to reach out to jpark@forgepointcap.com and kshih@forgepointcap.com.

This blog is also published on Margin of Safety, Jimmy and Kathryn’s Substack, as they research the practical sides of security + AI so you don’t have to.

[1] Yes, I have read their math appendix [here]. It’s citing this paper, also by Anthropic researchers, to justify its assumption about underlying task timing. However, when you read the cited paper, it basically uses Claude to assume the task time.

As a sanity check, I just asked Claude Sonnet 4.6 how long it should take an adult to write a grocery list, write a 1200 word industry blog (hello!), or book a flight. It cited 5-15 minutes, 3-6 hours, and 5-20 minutes. But when I asked for color, it provided task breakdowns that didn’t add up to the listed range! For grocery lists, its breakdowns add up to 5-30 minutes, for the blog 2.75 – 5.5 hours (ok, pretty close), and for the flight 5-40 minutes for simple itineraries or 45-100 minutes for complex ones. The variability on these timings, especially for the flight booking case, leaves me with some questions around estimate quality. I would argue that the more Anthropic is going to rely on these estimates as part of an ongoing line of inquiry, the more they should publish the raw data for folks to review.

To its credit, the paper in which these estimates are first performed does some sanity checks against the O*NET data, but the O*NET data is quite coarse and itself lacks useful timing benchmarks. And the sanity checks basically say the data is roughly aligned, which doesn’t mean much for the amount of error that could then be creeping into downstream analysis like industry level task time-weighting.

[2] In this case, playing with it is extra valuable because it can be hard to derive what Gen AI is good at from first principles, especially as quiet investments in things like tools for math or private datasets for structured reasoning continue.

[3] The biggest gotchas: it rolled its own distinctly non-random random number generator (why?) and incorrectly assumed that some financial math necessarily converges after a fixed number of iterations!