Systems Over Scale: What Bridgewater Teaches Us About the Enterprise AI Plateau


I have lost count of how many client conversations this year have gone the same way. Someone tells me the model isn’t accurate enough yet for what they want to do, and the plan is to just wait for the next release. GPT whatever. Claude whatever. Gemini whatever. Someone bigger and smarter is always around the corner, so why do the hard work now?

Bridgewater just published a paper that quietly pokes a hole in that thinking, and I think it deserves more attention than it’s getting outside finance circles.

They took an open weight model, Qwen3-235B, and ran it through a serious reinforcement learning and distillation pipeline built with Thinking Machines Lab. The result was 84.7% accuracy on their internal financial evaluation suite at a fraction of the inference cost of the big commercial models. Those are impressive numbers.

But the numbers aren’t really the story.

The story is how they got there.

Everyone’s first assumption will be that Bridgewater won because they have proprietary data nobody else has. Sure, that helps. But I think the more interesting thing they built is the feedback loop around the data, not the data itself.

They didn’t have their best investment people label every single example. That would be a waste of very expensive time. Instead they trained a baseline model on cheaper vendor-labeled data first. Only when the model disagreed with the vendor label did it get routed to an experienced investment professional for a second opinion.

So the expensive human judgment gets spent exactly where it matters most, on the cases that are genuinely ambiguous, not on the easy 90 percent that any reasonable process would get right anyway.

Then on the training side, they didn’t just keep distilling from the same fixed teacher model forever. The student gets promoted to teacher only once it proves it’s actually better on validation. That’s a small design choice that I think matters a lot. It keeps the whole system improving instead of plateauing around whatever the original teacher model was capable of.

If I had to boil the lesson down to one sentence, it’s this.

Good enterprise AI usually comes from a better feedback loop, not from more data or a bigger model.

I also think that’s why so many enterprise AI projects seem to plateau around the same accuracy range. Once you’ve exhausted prompt engineering and upgraded to the latest foundation model, the next gains usually don’t come from a smarter model. They come from better supervision, better routing, better feedback, and better systems.

Now, a few things I’d push back on if I were reviewing this paper with a client.

The cost savings headline needs a footnote. A 235 billion parameter model doesn’t run itself. You still need GPUs, batching, latency tuning, people who know how to keep the thing running. If you’re processing enormous volumes every day, owning that infrastructure can absolutely pay off. If your workload is lumpy or unpredictable, a commercial API that turns fixed infrastructure cost into a variable line item might still be the smarter bet.

This isn’t a universal answer. It depends entirely on how much you actually use the thing.

I’d also gently push back on the framing of “replicating expert judgment.” Many of the evaluated tasks focus on document segmentation, filtering, classification, and finding the needle in a haystack of financial text. That’s genuinely useful work and it saves analysts a ton of time. But it is not the same as a model independently coming up with a macro thesis or an investment idea nobody has had yet.

Parsing information well and synthesizing new insight are two different skills. I’d want any vendor or internal team to be honest about which one they’re actually selling me.

And specialization has a cost that doesn’t show up in the benchmark table. A model tuned tightly to today’s financial reporting formats and today’s regulatory language will need care and feeding when those things change, and they always change.

That’s not a knock on the approach. It’s just the maintenance bill nobody talks about until the invoice shows up.

A lot of IT organizations aren’t set up yet to treat retraining and re-distillation as an ongoing operational cost the same way they’d treat patching a production system.

Here’s where I land on all this.

The Bridgewater paper isn’t proof that the big frontier models are becoming irrelevant. It’s evidence that enterprise AI is becoming an architectural discipline.

The organizations that win won’t necessarily be the ones with access to the biggest models. They’ll be the ones that build the best systems around them.

Use specialized models for the high-volume, close-to-the-data work. Save expensive frontier reasoning for the small slice of problems that are genuinely hard and ambiguous.

That’s a tiered architecture. It’s a lot more work than pointing everything at one API. But it’s also a lot harder for a competitor to copy, and that’s usually the kind of advantage worth building.

Your AI Strategy Is About to Become the Most Expensive Commodity Bet in Your Company



In my last post, I argued that boards face a growing geopolitical risk in their AI strategy, that a single regulatory action or export control directive could sever access to a mission-critical workflow with zero warning.

If that is true, it raises an immediate question: why are so many enterprises doubling down on deep dependency on a small number of frontier providers?

The answer is economic. And it deserves immediate scrutiny.


It isn’t regulation that concerns me most right now.

It’s economics !!!

Many organizations are investing in AI as though the cost of intelligence will remain permanently scarce and permanently expensive. That assumption is beginning to break down, and the enterprises built on it are about to find out what happens when pricing assumptions change mid-cycle.

Think about signing a five-year cloud infrastructure agreement just before hyperscale providers permanently reset the economics of compute. You were not buying a lasting advantage. You were locking yourself into yesterday’s pricing model. Many boards are about to make that same mistake with AI.

There is an important nuance here. Not all AI dependency is equal. An enterprise with an orchestration and governance layer between itself and model providers is in a fundamentally different position from an enterprise that has wired core workflows directly to a single API. Optionality does not require full in-house control. It requires the ability to switch, and someone in your AI value chain must demonstrably own that capability. If you are using a managed services partner to run agentic workflows, the question is not just whether they are using the best model today. It is whether their architecture allows them to substitute models without rebuilding your workflows from scratch.

That distinction matters more than which model is currently running.


The Great Commoditization of Intelligence

For the past two years, the dominant assumption has been simple: frontier intelligence would remain scarce, and scarcity would justify premium pricing. That assumption is starting to break.

Across the ecosystem, capable models continue to improve at speed. Open-weight systems are advancing rapidly. Governments increasingly treat AI infrastructure as strategic capability rather than commercial software, which means they are actively funding alternatives to frontier providers.

China’s AI ecosystem illustrates this clearly. Systems such as DeepSeek, alongside models from Alibaba’s Qwen family, show that careful systems engineering and open-weight development can deliver highly competitive performance without relying exclusively on cutting-edge semiconductor supply chains.

Whether this is industrial policy or competitive pressure matters less than the outcome. The economics are shifting.

We have seen this before.

Servers became commodities. Operating systems became commodities. Cloud infrastructure became utilities.

Open source repeatedly compressed margins at the infrastructure layer while expanding value creation above it. AI is following the same trajectory.

That does not mean frontier innovation stops. It almost certainly will not. But it does mean boards need to ask a more uncomfortable question: will your business still pay frontier prices once today’s breakthrough becomes tomorrow’s baseline capability?


The Enterprise Contract Trap

Many CFOs are locking in multi-year AI agreements based on today’s economics. Frontier providers argue this is rational. They point to rapid capability gains in reasoning, multimodal systems, and autonomous agents as justification for sustained premium pricing.

They may be right. But that increases the risk rather than reducing it.

If capability leadership shifts every six to twelve months, long-term contracts become structurally fragile. Your architecture will evolve faster than your procurement cycle.

Every enterprise AI contract is implicitly a bet on scarcity. You are betting that the cost of intelligence will remain high enough to justify today’s terms. That is a historically fragile position in any technology that becomes foundational.

Some will argue open-weight systems are not meaningfully cheaper once governance, engineering, security, and compliance are included. That is true today. But it confuses current friction with structural cost.

Managed services compressed complexity in cloud infrastructure; managed inference is beginning to do the same for AI. The direction of travel matters more than the current state.

The honest board conversation is not “open-weight versus closed.” It is: at what capability and cost threshold does open-weight become the rational choice for each of our core workflows, and are we tracking that threshold over time?

Most organizations are not. Most managed service agreements do not surface it either, which means the question has to be owned explicitly at board level.

As intelligence becomes more accessible, advantage shifts away from the model and toward workflows, data, distribution, and execution.


The Metric Your Board Isn’t Measuring

Most boards still anchor on cost per million tokens. That is already the wrong unit.

Enterprise AI is moving into multi-agent systems where a single request triggers planning, reasoning, tool use, validation, and repeated self-correction before a result is produced. The user sees one interaction. The system may execute dozens of internal cycles.

This breaks the linear relationship between price and output. Token costs can fall while system costs rise if orchestration is poor.

The KPI that matters is not cost per token. It is cost per completed business outcome.

That requires measuring the full workflow: inference, orchestration, failure handling, human review, and retry logic.

It also exposes a second-order issue most boards do not yet track: whether their AI stack, or their partner’s AI stack, is genuinely provider-agnostic, or quietly optimized around a single vendor’s architecture.

That distinction matters because pricing and capability will not move in lockstep. When they diverge, lock-in becomes visible in operating costs rather than contracts.

An orchestration layer designed for model substitution is structurally more valuable than one optimized for a single provider, even if the single-provider version performs slightly better today.


The Geography Arbitrage Myth

A common response from executive teams is straightforward:

“We’ll build a team in Singapore. We’ll route inference through Dubai. We’ll separate legal entities.”

The assumption is that geography solves dependency risk. That assumption is weakening.

Regulators are increasingly focused on ownership, control, technology transfer, and operational influence rather than corporate domicile alone. Major economies are tightening rules around strategic technologies and data flows regardless of where a company is incorporated.

This does not mean global AI is splitting into fully isolated systems. But it is forming partially distinct ecosystems with increasing friction between them. For multinational companies, navigating those boundaries is becoming a capability, not an implementation detail — and it is doubly fragile when the underlying model economics are also in motion.


Where Competitive Advantage Actually Lives

None of this implies enterprises should abandon frontier AI systems. Many organizations will continue to rely on them for reliability, compliance, and legal protection. That is rational.

The mistake is assuming this architecture is permanent.

As intelligence becomes more accessible, advantage shifts away from the model and toward the things competitors cannot easily replicate: proprietary workflows, institutional knowledge, high-quality internal data, customer trust, and execution speed.

The same logic applies to AI partners. A partner whose value is deep integration with a single provider’s stack carries a different risk profile from one whose value is workflow design, domain expertise, and cross-model orchestration. Both can be valid choices. But they are not equivalent when infrastructure economics shift.

Models may become interchangeable. Operating systems of work will not.


Every Major Technology Shift Follows The Same Pattern


The scarce, expensive thing becomes abundant. The companies that treated scarcity as strategy lose. The companies that built differentiated capability on top of abundance win.

The geopolitical risk in the first post can sever AI access overnight. The economic risk in this post can erode AI advantage over time. Both point to the same conclusion: resilience comes from optionality, not dependency.

The question for your board is no longer which AI model to bet on. It is this: what happens to your competitive position when intelligence becomes widely available and competitively priced — and how quickly can you adapt when it does?

The biggest mistake a board can make today is assuming intelligence will remain the scarce asset.

The companies that win this cycle will be the ones that build everything around the moment it stops being one.

The Second-Order AI Trade


I’ve been thinking about one of the more interesting disconnects in public markets today: how investors are valuing IT services and BPO companies in the age of AI. The more I look at it, the more I think the market is asking the right question and arriving at the wrong conclusion.

The question is straightforward. If AI automates knowledge work, what happens to companies whose business model has historically depended on selling that work?

The conclusion the market seems to have reached is equally straightforward. AI replaces labor. IT services sell labor. Therefore IT services become less valuable over time.

There is truth in that argument. AI is already changing how software gets written, how applications are tested, how documentation is produced, and how support teams operate. Anyone who has watched a capable engineer work with modern AI tools knows this isn’t marketing hype anymore. Productivity gains are real and they are compounding.

But I think that line of reasoning stops one step too early.

It captures the first-order effect of AI, which is labor automation. What it misses is what that automation enables at a system level. That is where I think the more interesting question lives.

The work being automated is not the same as the capability being sold

There is certainly a future where enterprises need fewer developers, fewer testers, and fewer support engineers. Companies whose only competitive advantage is supplying lower-cost labor should probably be worried.

Not every services company falls into that category.

The best enterprise services firms have never been valuable because they employ thousands of engineers. They are valuable because they understand environments that outsiders struggle to navigate.

They know which undocumented application still produces a nightly file that another critical system quietly depends on. They know that touching one integration point can break five downstream processes that nobody has looked at in years. They know that the CIO wants transformation, the CFO wants predictability, and the compliance team wants stability unless every audit requirement has been addressed first.

That is not just technical knowledge. It is organizational knowledge accumulated over years of operating inside a client’s environment. It is difficult to capture in a proposal document and even harder to encode into a model.

I am aware this argument has a shelf life. Tools are already emerging that attempt to map legacy dependencies and capture institutional context automatically. Over a long enough horizon they will likely succeed in parts of it. But enterprise transformation has never moved on the timeline that technology capability would suggest, and the window in which this knowledge remains economically valuable is probably longer than the market is currently pricing.

Enterprise complexity has a way of surviving every technology cycle

For years, CIOs, particularly in financial services, have been describing the same objective: retire the mainframe and move everything to the cloud.

Many have made substantial progress. Many are still running critical workloads on systems that were expected to disappear years ago.

This is not because enterprises resist innovation. It is because replacing core operational systems is rarely a technology problem alone.

Every major system sits inside a web of operational dependencies, regulatory obligations, contractual commitments, security controls, and organizational habits. Changing one component often requires changing dozens of others that were never part of the original plan.

AI does not remove that complexity. In many cases it adds another layer to it.

Who is accountable when an AI system makes a recommendation that turns out to be wrong? How should that output be audited? Which data can legally be processed in which jurisdictions? How do organizations monitor hundreds of AI-enabled workflows without introducing entirely new operational risks?

Those questions are becoming more important, not less.

The microservices lesson is worth remembering

When microservices became the dominant architecture for enterprise software, the promise was compelling. Smaller services. Faster releases. Independent teams. Greater flexibility.

Those benefits were real. So were the unintended consequences.

Entire categories of software companies emerged to solve problems that microservices themselves created. Observability platforms, distributed tracing, platform engineering, service meshes, and reliability engineering all became important because managing hundreds of services turned out to be substantially harder than managing one large application.

Automation did not eliminate operational work.It changed where the work lived. AI may follow a similar path.

As the cost of automating business processes falls, organizations are unlikely to automate fewer processes. They are more likely to automate many more. This is Jevons paradox applied to enterprise automation and I don’t think we should dismiss it casually. The difference between enterprise IT and markets where automation genuinely collapsed demand is that the underlying problem space keeps expanding. Travel booking was a fixed market. Enterprise technology complexity is not.

An enterprise managing 500 AI-enabled workflows has a very different operational challenge from one managing 50, even if each individual workflow becomes cheaper to build.

Somebody still has to integrate those systems, monitor them, govern them, secure them, and continuously improve them.

Where I think investors are making the real mistake

The label “IT services” has become too broad to be useful. It groups together companies with fundamentally different trajectories.

Some firms are still competing primarily on labor cost and billing by the hour. Those businesses face genuine structural pressure. If clients need fewer hours and AI keeps driving that number down, there is no natural floor. CIOs will demand those productivity gains come back to them as rate reductions, and they will largely be right to do so.

I should be clear that the shift to outcome-based pricing is not inevitable and not frictionless. Enterprise procurement teams have been buying hours for decades, and outcome-based contracts transfer risk in ways that make buyers cautious. This transition has been discussed for years and has historically moved slowly. What is different now is that pressure on unit economics may become strong enough that both sides are pushed toward new structures. That does not imply speed. It implies necessity over time.

Others are quietly becoming something different.

They are automating their own delivery so repetitive work requires fewer people. They are building repeatable approaches for AI governance, implementation, and compliance. They are accumulating experience from dozens of enterprise AI deployments that clients cannot easily replicate internally.

Most importantly, they are shifting from selling effort to selling outcomes.

That distinction changes the economics in a very specific way. If you charge for hours worked, AI simply reduces the hours available to bill. But if you charge for a business outcome and AI allows that outcome to be delivered with fewer people, revenue stays stable while delivery costs fall. The firm that used to need fifty engineers to fulfill a contract might now need fifteen. The contract value does not change. The margin does.

Not every management team will navigate this well. Some will automate delivery and pass the savings straight through to clients in the form of lower prices, which improves competitiveness but leaves the underlying economics unchanged. The firms worth watching are those that automate delivery and reprice the work at the same time.

From a distance, these two groups still look similar. Over the next several years, I suspect their financial performance will not.

The obvious AI trade has been building intelligence. The less obvious trade may be building the organizations that make intelligence usable inside large enterprises. I suspect the market is still treating those as the same thing.

As usual, these are strictly my personal views.