The evidence for operating-model-first AI (and why 95% of pilots fail)

The 95% number, unpacked

In July 2025, MIT's NANDA initiative published the State of AI in Business report, based on a study of more than 300 publicly disclosed enterprise AI initiatives, structured interviews with 150 senior leaders, and a survey of 350 employees. The headline finding was sobering: 95% of generative AI pilots in their sample produced no measurable contribution to the profit and loss.

The number got coverage in Fortune and elsewhere as a kind of failure statistic, but the methodology section is more interesting than the headline. The pilots that worked were not the ones with the best models. They were the ones grounded in a specific business process, owned by an operations leader rather than an IT or innovation team, and measured against an outcome the business already cared about.

Why pure-tools AI consulting fails

The dominant pattern in the failing 95% is what we call pure-tools consulting. A consulting firm shows up with a model, a workflow tool, and a deck of use cases. The pilot picks a use case that the model can do well, demonstrates it, and ends. Nothing changes in how the business runs.

This is not a criticism of the consultants. It is a structural problem with the brief. If the engagement is defined as 'show us what AI can do', the answer will be a demo, and a demo will not move the P&L.

The operating-model-first reframe

We work the other way around. We start with the operating model, identify where the operation is constrained, and then ask which of those constraints AI can credibly relax. The model and the tool come third, not first.

The reframe sounds obvious. It is also unusual. It requires the consultant to understand the operation in enough detail to have a view on its constraints, which takes time that most engagements do not budget for.

What 'operating model' actually means in operational terms

Operating model is one of those phrases that means everything and therefore nothing. We use it specifically. For us it has three layers, and we look at each one before we make a recommendation.

Process: how work flows from a trigger to an outcome, who owns each step, where the handoffs are, and where the queues form.
Data: what is captured, where it lives, who owns its quality, and whether it can be trusted as the single source of truth for the process above.
Systems: which platforms hold the data, how they are integrated, and where the integration is brittle or absent.

BCG's complementary finding

BCG's 2024 research, including their 'Where's the Value in AI?' work, lands on a complementary finding. 74% of companies they studied struggle to achieve and scale value from AI. The 26% that succeed do something specific: they redesign their processes around AI, rather than bolting AI onto existing processes.

The split between the two groups is not capability. The losing 74% have the same model access, the same vendor relationships, and often a bigger budget. What they do not have is a willingness to change how work flows. Their AI work lives in an innovation team or an IT function, runs as a series of pilots, and never enters the operation. The winning 26% put AI inside the operating model and accept the disruption that comes with it.

Read together, the MIT and BCG findings tell a single story. The constraint on AI value is not model capability and not tool selection. It is the willingness to change the operating model around the new capability. That is the work most engagements skip, and that is the gap we built our method to close.

The Margin Labs method as a response to the evidence

Our method, the free AI Profit Roadmap, is built around this evidence. We spend the first call understanding the operating model. We produce a costed roadmap that names the three or four constraints AI can credibly relax, sequences them by payback, and is explicit about the no-go list.

It is unfashionable. We do not lead with the model, and we do not lead with the use case. We lead with the operating model and the numbers, and we are happy to be measured on the outcomes that result. The evidence says that is where the value is, so that is where we work.