The Builder, the Critic, and the Circuit Breaker: How I’d Design AI Agents That Don’t Bankrupt You


Most multi-agent architectures look elegant on the whiteboard. Agent A generates the output. Agent B judges it against a strict checklist. No AI grading its own homework, clean separation of concerns, autonomous iteration until the job is done.

Then you open the billing dashboard.

Your agents are locked in a loop, Agent A revising, Agent B rejecting, neither making meaningful progress, while your token costs compound by the minute. This failure mode has been documented across production agent deployments at companies ranging from early-stage startups to large enterprises, and it almost never shows up in the architecture review.

The system passes every test in a supervised environment. It breaks in a specific way when nobody is watching.

That distinction matters more than it might seem. We are in the middle of a fundamental shift in how AI is actually deployed, from synchronous AI, where a human waits for a response and can intervene at any point, to unattended autonomy, where agents run complex multi-turn workflows in the background, entirely out of sight. When a human isn’t watching the screen, agents can quietly go to war with each other. You end up funding both sides.

The design constraint that most teams miss isn’t the AI itself. It’s the discipline of using thin, deterministic code to cage non-deterministic intelligence, knowing when to pull the plug, and building the infrastructure to do that gracefully before the token burn compounds.

What follows is the architecture required to enforce boundaries on unattended autonomy, and the reasoning behind every decision.

The Two Failure Modes of Cognitive Deadlock

The two primary failure modes of multi-agent systems are near-opposites, and a naive fix for one reliably triggers the other, trapping your system in a Cognitive Deadlock.

Sycophancy is the first. AI models are naturally drawn to fluent, confident-sounding text. Agent B, your critic, will frequently approve Agent A’s output simply because it reads well on the surface, missing underlying logic errors, hallucinated facts, or reasoning gaps. You think you have a quality gate. You actually have a mutual appreciation society.

Token Churn is what happens when you try to fix that. You instruct Agent B to be harsher: find flaws, if you don’t find one you’ve failed. Now it rejects everything. Agent A revises but hits the ceiling of its own capability, returning a marginally different version. Agent B rejects again.

Software agents have no concept of time, urgency, or money. A human stuck in this loop would eventually say, “I don’t think we’re getting anywhere.” An autonomous agent will simply keep burning tokens indefinitely at your expense.

One nuance worth carrying into the architecture: these failure modes don’t always arrive sequentially. On open-ended analytical or creative tasks, both can coexist within the same loop, Agent B approving weak outputs on some rubric criteria while rigidly rejecting on others. The failure landscape in practice is messier than a clean either/or, and the system needs to account for that from the start.

Why More AI Makes This Worse

The instinct when something breaks in an AI system is to add more AI. Another monitoring model. Another validation layer. Another agent to watch the agents.

This is almost always the wrong move. You’re solving an orchestration problem, which is fundamentally about enforcing hard boundaries, with a tool designed for open-ended reasoning. The result is compounding unpredictability at compounding cost.

The fix is simpler and older: a thin layer of traditional, deterministic code sitting above the models. It doesn’t reason or deliberate. It counts, measures, and cuts. Think of it less like a manager and more like a circuit breaker on an electrical panel, no understanding of what’s flowing through the system, but precise knowledge of when to shut it off.

This is Architectural Minimalism applied to agent systems: the sharpest possible line between what AI calculates and what traditional code enforces. The models handle open-ended reasoning. The orchestration layer holds the edges rigid. Three patterns form the core of that layer, each one a direct response to a failure mode that emerges without it.

Pattern One: The Hard Cap

The simplest and most important constraint: set a maximum iteration count and make it unconditional.

Four to five rounds is the absolute ceiling for enterprise workflows. If the loop hasn’t resolved by then, the agents aren’t converging on an answer, they’re producing token churn. The orchestrator ends the loop regardless of where things stand.

This feels blunt. That’s the point. The value of a hard cap isn’t intelligence, it’s unconditional enforcement. No amount of confident output from either agent can override a counter hitting five.

For engineers: The iteration counter must live in persistent state completely outside the model context, agents have no visibility into it and cannot influence it. On each cycle, before invoking either model, the orchestrator checks the counter and raises a LoopLimitExceeded exception if the threshold is met. This state must survive process restarts; an in-memory counter that resets on an unhandled exception defeats the purpose entirely. Use your task queue’s native state store or a lightweight Redis key with a TTL set conservatively above your maximum expected loop duration. Tag every loop invocation with a correlation ID from initialization, you’ll need it for the observability layer, and retrofitting it later is painful.

Pattern Two: Stagnation Detection

Agents frequently fall into token churn well before hitting the hard cap. Agent A stops making structural changes and starts rewriting surface prose, “furthermore” becomes “in addition,” paragraphs get reorganized without changing substance. From the outside it looks like active progress. It isn’t. And it burns tokens at the same rate as genuine work.

The orchestrator catches this by measuring how much content actually changes between rounds. When the delta drops below a meaningful threshold, it recognizes the agent has exhausted its ideas and breaks the loop early, before the hard cap is reached.

The starting threshold sits at roughly 5%, derived from a straightforward observation: meaningful structural revision typically moves at least 15 to 20% of content. Token churn clusters near zero. The 5% line sits safely in the floor of the gap between them, protecting against slow structural leaks. That said, the right threshold depends on your output type, and the only honest way to calibrate it is to instrument first and tune from data.

For engineers: Choose your similarity metric carefully, token overlap and normalized edit distance behave differently depending on output structure, and picking wrong silently breaks your stagnation detection. For open-ended prose, token overlap is more stable. For structured enterprise outputs, such as JSON schemas, legal clauses, or templated compliance documents, normalized edit distance is more sensitive to meaningful changes.

Calibration requires labeled examples: a set of round-pairs that a domain expert has manually classified as either genuine revision or surface churn. Fifty to one hundred labeled pairs is typically enough to validate your metric choice and threshold. Run both metrics against this set, compare precision and recall, and commit to whichever performs better on your specific output type. This step takes a day and saves weeks of silent miscalibration in production. A threshold that fires too early causes unnecessary rollbacks. One that never fires is invisible until you audit the bill, by which point the cost is already spent.

Enterprise constraint: Running text-distance calculations on large outputs inside the orchestrator can introduce latency. Handle this efficiently inline for outputs under roughly 10,000 tokens, or offload to an async worker for longer documents.

Pattern Three: Temperature Stepping

Temperature controls how deterministic or exploratory a model’s output is. Low temperature produces focused, precise responses. High temperature introduces variance, and occasionally breaks a model out of a logical rut it cannot escape through refinement alone.

Rather than holding temperature constant, the orchestrator steps it based on loop position:

  • Rounds 1–2: Low temperature. Focused, precise output.
  • Round 3: The orchestrator injects a direct intervention prepended to the system message: You have been rejected twice. Do not refine your previous approach, abandon it and try something structurally different.
  • Round 4: Temperature is increased, forcing genuine exploration rather than iteration on a failing strategy.
  • Round 5: Hard cut.

The architecture mirrors a core tenet of classical optimization: a system stuck in a local minimum needs a controlled injection of variance to escape. That is the logic behind simulated annealing, and it applies meaningfully to language model behavior. A model stuck at low temperature on a problem it cannot solve will keep producing the same wrong answer with increasing confidence. The temperature bump is a last resort before human escalation, not guaranteed to work, but cheap enough to always be worth attempting.

For engineers: Set temperature at the API call layer, not within the prompt. The Round 3 system message injection should be prepended, not appended, since position affects attention weighting in current transformer architectures. Log temperature values alongside each draft in your run record; when debugging a failed loop, temperature trajectory is often the fastest signal for distinguishing genuine exploration from a stuck model.

Enterprise constraint: Several corporate API gateways restrict per-call temperature adjustments. If that’s your environment, substitute Prompt Hardening: at Round 3, replace the system prompt entirely, switching from an open-ended coaching prompt to an aggressive few-shot prompt that mandates strict structural compliance. The mechanism differs; the intent is identical.

Instrumentation: The Operational Audit Trail

Treat observability as a first-class design concern, not something to retrofit after the first production incident.

Without it, the circuit breaker layer is a black box. You know it fired. You don’t know which pattern triggered it, how often, on what input types, or whether your thresholds are correctly calibrated. You cannot improve what you cannot see, and in an unattended system, you may not notice what’s wrong until the bill arrives.

Log the following for every loop run: correlation ID, total iterations, exit reason (hard cap, stagnation, success, or human escalation), delta score per round, Agent B rubric result per round, temperature per round, and whether a best-effort draft was delivered or a handoff was triggered.

These logs should aggregate into a simple operational dashboard displaying iteration count distribution, circuit break frequency by type, rollback rate, and handoff rate. Within a few hundred runs you’ll have enough signal to tune thresholds from data rather than intuition. This log also becomes your audit trail when a stakeholder asks why a specific high-value output was delivered as a partial draft rather than a completed one.

When the Circuit Breaks: Delivering Something Useful

Stopping the loop prevents runaway cost. But there’s still a user waiting for a result, and “the AI gave up” is not a product experience. Graceful failure is an engineering requirement, not an afterthought.

Save the best draft, not the last one. The orchestrator caches every draft alongside the structured rubric score Agent B assigned it. If Round 2 produced a draft passing 85% of quality criteria and later rounds failed to improve on it, the system rolls back and delivers Round 2. The user gets an imperfect but usable result instead of an error screen. In most business contexts, 85% complete is a workable starting point, not a failure.

That percentage is only meaningful if the rubric behind it is well-defined. Agent B’s checklist needs explicit, binary criteria, not is this good? but does this section contain a pricing breakdown? Yes or no. The score is passed criteria divided by total criteria.

One example of a criterion that looks binary but isn’t: “is the tone professional?” That’s a judgment call dressed as a checkbox. Two evaluations of the same document will produce different scores, which corrupts your rollback logic and makes your quality threshold meaningless. The test for a valid rubric criterion is whether two different reviewers, given the same document, would independently reach the same answer. If they wouldn’t, decompose the criterion until they would, before it touches production.

Communicate failures in human terms. If no draft clears the minimum threshold, the response should be specific and actionable: We generated 80% of your proposal but encountered a conflict in the pricing section. The draft has been saved and flagged for review. Structure this as a typed failure object that downstream systems can parse and route, not a string that gets logged and forgotten.

Hand off to a human with full context. When the loop breaks, the orchestrator packages the original prompt, the highest-scoring draft, the specific failed rubric criteria from Agent B, and the execution trace. A human reviewer sees precisely where the AI got stuck, fixes that specific gap, and approves. Targeted human judgment applied at the exact point of failure, not a restart from scratch.

The Enterprise Agent Paradox

This handoff introduces the sharpest organizational challenge in agentic system design, and it’s one the architecture alone cannot solve.

If an autonomous agent circuit-breaks on a high-value client document, who gets the notification? What is their SLA to respond? If a human reviewer takes four hours to log in, review, and patch the 15% gap the AI couldn’t close, has your unattended autonomy actually saved the enterprise any time, or did it just shift the operational friction downstream?

The bottleneck of an agentic system is rarely the AI. It is your human routing architecture.

The practical resolution is Omnichannel Exception Routing: do not build custom review interfaces from scratch. Export standard schemas, such as OpenTelemetry events or typed webhooks, that inject circuit-break exceptions directly into your existing enterprise ticketing infrastructure, like ServiceNow, Jira, or whatever queue your team already monitors. The human side of the handoff needs to live where human attention already lives, governed by SLAs that already exist.

Building a parallel review system is an organizational change management problem disguised as an engineering task. Teams that treat it as purely technical consistently underestimate the timeline by a factor of three. Plan for that gap explicitly; the technical handoff can be built in a day, but getting the human side right typically takes weeks and involves stakeholders well outside the engineering team.

The Cost Case for Building This Upfront

A loop running to the hard cap consumes roughly five to ten times the tokens of a successful two-round completion on the same task. At scale, with hundreds or thousands of daily agent invocations, uncontrolled loops don’t just create poor user experiences. They create billing events that compound faster than most teams notice before the monthly statement arrives.

The orchestration layer described here adds minimal compute overhead. The logic is deterministic code running entirely outside the model context. The investment is engineering time, specifically days to weeks to build and calibrate. The alternative is discovering your failure thresholds the hard way, at production volume, on your cloud bill.

Build the circuit breaker before you need it. By the time you need it, you’re already paying for not having it.

The Principle Underneath the Architecture

Every instinct in agent system design pulls toward more intelligence, a smarter critic, a more capable builder, another layer of AI judgment applied wherever the last layer fell short. That instinct is understandable and almost always counterproductive.

The pattern that holds up under pressure is a strict division of labor: AI handles the open-ended reasoning, traditional code enforces the boundaries, existing human workflows handle the exceptions. Not as a fallback for a broken system, but as a deliberate architectural choice in a working one.

There’s a counterintuitive implication worth carrying forward: as AI agents become more capable, the orchestration layer becomes more important, not less. A more capable agent running in an uncontrolled loop causes more damage, faster, at greater cost than a weaker one. The sophistication of the AI and the rigidity of its constraints need to scale together. Every increase in autonomy is an argument for a more robust circuit breaker, not a less necessary one.

Autonomy without boundaries isn’t architectural minimalism. It’s just risk with a better interface.

Build the builder. Build the critic. Make sure something that doesn’t think is always in charge of knowing when to stop.

Your AI Stack Has a Geopolitical Risk. Your Board Doesn’t Know It Yet.



In March 2022, the U.S. Office of Foreign Assets Control imposed sweeping sanctions on Russian entities following the invasion of Ukraine. Within 72 hours, Microsoft Azure, AWS, and Google Cloud began cutting off services to affected Russian customers. Businesses that had built mission-critical workflows on those platforms discovered, at speed, that a U.S. government directive could reach inside their operations regardless of where they were headquartered or where their data sat.


That moment was a warning. Most enterprise boards filed it under “geopolitical tail risk” and moved on.
The same logic has now arrived at the AI layer, and the dependencies are deeper, the warning period shorter.


We already have a preview. When Italy’s data protection authority banned ChatGPT over privacy concerns, local businesses relying on it for operational workflows lost access overnight. The Rome Court ultimately annulled the subsequent 15 million euro fine in March 2026, but it did so on a single jurisdictional point: once OpenAI established its Irish subsidiary, the Irish Data Protection Commission became the lead supervisory authority, stripping the Italian regulator of its right to issue a final sanction. The court never examined whether the underlying data practices complied with GDPR. 

Boards should not mistake a jurisdictional escape hatch for an operational green light. The real lesson was the speed of the initial disruption. One regulatory decision, and a core enterprise tool was gone with zero advance notice and zero transition period.


Italy was an early flashpoint. The regulatory landscape has since shifted structurally. The EU AI Act’s high-risk obligations are in the final stages of legislative revision, with a new enforcement deadline of December 2027 agreed in May 2026, a postponement that reflects political complexity rather than reduced intent. The BIS framework in the U.S. is tightening. The question is no longer whether a regulatory action will disrupt your AI operations.

The Dependency You Probably Haven’t Stress-Tested

Ask your CTO a simple question: if your primary frontier AI model provider became unavailable for 30 days due to a regulatory action, an export control directive, or a government-mandated review, what would break, and how quickly?


For most enterprises, the honest answer is deeply uncomfortable. Over the last 18 months, AI has moved far beyond experimental chatbots. It has been woven into autonomous, multi-agent workflows that run core operational pipelines: customer service execution, automated contract analysis, code generation, financial modelling, and compliance screening. When these integrated systems run on a single model, a vendor blackout does not just stall a user query, it halts the automated engine of the business.


Unlike a SaaS CRM or a cloud storage provider, frontier AI models are not commodities. They are concentrated in a handful of U.S.-headquartered companies (Anthropic, OpenAI, Google DeepMind) whose foundational IP and cloud infrastructure are now explicitly subject to tightening U.S. export controls, national security reviews, and data retention mandates that may directly conflict with local privacy regulation. GDPR is only the most obvious example. India’s DPDP Act, Brazil’s LGPD, and the EU AI Act’s transparency requirements all create potential collision points with U.S. vendor terms of service.


Your board does not need to understand transformer architecture. It needs to understand that treating frontier AI as a politically neutral utility, the way you might treat electricity or broadband, is now a critical governance error.

The Strategy: Sovereignty and Hedging

Navigating this requires moving the conversation out of the engineering backlog and into the boardroom, focusing on three strategic pivots.

1. Mandate a Hybrid Model Architecture

The open-weight vs. closed-source debate is no longer an engineering preference; it is a sovereignty conversation. Models like Meta’s Llama series or Mistral can be self-hosted within your own infrastructure perimeter, giving you operational custody and insulation from a foreign vendor’s sudden API kill-switches, executive orders, or unilateral changes to data retention policies.

The right architecture is a tiered model: closed frontier systems reserved strictly for high-stakes, hyper-complex reasoning tasks where capability genuinely justifies the concentration risk; open-weight models running in your own environment for core operational workflows where availability and data sovereignty matter more than the last percentage point of benchmark performance.
The board must demand clear accountability: who owns the decision about which corporate workflows are allowed to tolerate external model dependency, and what is the review cycle?

2. Implement an Independent Orchestration Layer

CFOs do not leave a company’s currency exposure unhedged on the grounds that exchange rates are probably fine. The same discipline should apply to model provider exposure. An intelligent orchestration layer, or model router, must sit between your applications and your model providers. If a primary provider goes offline or changes its terms in ways that conflict with local regulation, the router redirects traffic to a secondary provider or a locally hosted model automatically.

The parallel to treasury is precise: you are not predicting that a provider will fail; you are ensuring that if it does, your operations survive. Do not expect the frontier labs to build this for you. Their business model relies on maximising your consumption of their flagship compute, and they lack your specific business context to route effectively.


This architecture requires planning for graceful degradation. In practice, this means having a fallback ready before you need it. If your primary frontier model goes dark, your orchestration layer must route workflows to a localized, self-hosted model that can securely handle the baseline transaction, keeping core operations running even if advanced reasoning is temporarily unavailable. The cost of building this independent routing layer is a fraction of the operational cost of a 48-hour AI outage across a large enterprise.

3. Map the Fragmented Global Risk Profile

The exposure is not uniform, and a global business must audit its risk based on where its delivery stacks actually sit.


For enterprises with China operations or Chinese ownership structures: This is the one most likely to surface a legal exposure your board does not know it has. U.S. frontier AI models are simply unavailable in mainland China. OpenAI cut off API access in July 2024 following U.S. Treasury restrictions on technology investment flows into China. The risk for global enterprises runs deeper than geography: Anthropic updated its terms of service in September 2025 to prohibit access for any entity more than 50% owned by a company headquartered in a restricted region, regardless of where that entity actually operates. A joint venture with a Chinese majority shareholder, incorporated and operating in Singapore or the UAE, may already be outside the terms of your AI vendor contracts. This is a legal and compliance exposure that needs to be audited now, at the entity level, across your full ownership structure.


For enterprises with significant India operations: India has become the execution layer for global business process automation and autonomous agent deployment. Building those stacks entirely on U.S.-centric closed models imports downstream regulatory risk into every client delivery. Navigating this requires a dual-track strategy. While enterprises must continue to leverage established global models for current production baselines, they must simultaneously fund parallel validation tracks for sovereign alternatives. India’s BharatGen Param2, a 17-billion parameter mixture-of-experts model trained on 22 trillion tokens of multilingual data using government-backed indigenous compute infrastructure, proves that open-weight alternatives are ready for enterprise testing. The immediate mandate for boards is not an immediate shutdown of current APIs, but the funding of shadow testing environments to ensure long-term architecture flexibility.


For U.S.-headquartered enterprises: The regulatory line has been drawn at the computational threshold of 10 to the power of 26 floating-point operations, the statutory boundary establishing a system as a frontier model under California’s Transparency in Frontier Artificial Intelligence Act (SB 53). Developers generating more than 500 million dollars in annual gross revenue face the most intensive obligations under the Act: they must publish annual catastrophic risk frameworks and report critical safety incidents to state emergency agencies within 15 days of discovery, shortened to 24 hours if the incident poses an imminent risk of death or serious physical injury. Violations are enforced by the California Attorney General and carry civil penalties of up to one million dollars per violation. An enforcement action by the California Attorney General against a primary lab would trigger an immediate operational blackout for any single-sourced enterprise. But if your organization also holds federal contracts or operates in regulated markets, that upstream compliance failure will bleed directly into your own legal and audit risk profiles overnight


For European enterprises: The political agreement reached in May 2026 to defer the EU AI Act’s high-risk obligations to December 2027 gives enterprises more runway, but it does not change the architecture decision. The data retention and monitoring policies that U.S. AI vendors operate under remain on a collision course with what European regulation will ultimately require. Using the delay to build compliance-ready infrastructure is the opportunity; treating it as a signal to stand down is the mistake. Domestic alternatives, Mistral and Aleph Alpha, are not inferior substitutes. They are the only providers whose architecture is designed from the ground up to operate within European regulatory constraints.

What Should Be on the Next Board Agenda

Three governance actions, each with a clear executive owner:
Commission an AI dependency audit. Map every workflow that touches an external model provider, classify each by operational criticality, and calculate what a 30-day outage would cost. This risk quantification must produce a concrete number the board can act on.
Assign explicit ownership. Move AI vendor risk onto the enterprise risk register with a named executive owner, likely the CTO or CISO, and a defined quarterly review cadence. If it currently lives nowhere, that gap is itself a governance finding.
Establish a sovereignty threshold. Define what proportion of core operational workflows must run on infrastructure your organisation controls directly, and set a hard timeline for reaching it. This is a strategic policy decision that belongs in the boardroom, not buried in an engineering backlog.

AI is core corporate infrastructure, as consequential to your operational continuity as your ERP or your payments stack. Boards that set a sovereignty threshold now, before an enforcement action forces it, will find that it costs far less to build the architecture than to explain why they didn’t.

The Art of the Email (Because Apparently, We Still Need to Talk About It)


It’s not a post I thought I’d be writing in 2026—but this week was one of those weeks where I felt a short refresher might be useful for everyone’s sanity.

Core Principle: Respect the time and cognitive load of the recipient.

When Not to Use Email

 When you are angry or sad: Don’t even risk typing a draft. Just close the app.

 When you need a nuanced or complex discussion: Pick up the phone or hop on a call. Talk it through.

 When it’s a quick, casual check-in: If it can be handled in a single sentence, move it to a messaging platform.

What a Useful Email Actually Looks Like

 A clear subject line: Make it easy to search for and crystal clear on intent (e.g., Action Required, FYI, URGENT). If you need my sign-off, a subject line like ⁠”Need approval for travel to London client meeting”⁠ will get my attention 10x faster than ⁠”Quick question.”⁠

 BLUF (Bottom Line Up Front): Always. Don’t make someone read a three-paragraph thesis before they figure out why you’re writing to them. State the point immediately, offer the explanation context below it, and invite them to chat if they have questions.

 Bold the key takeaways: Let’s face it, very few people read every word of an email. If your message is longer than a couple of sentences, use bold text to guide the reader’s eye to the most critical information.

 Be explicit with the “Ask”: Don’t make the reader guess what their homework is. If you need a response by a certain deadline, say it plainly: ⁠”Need your review of the attached deck by 3 PM ET on 3/7/26.”⁠

 Less is always more: Most people read emails on their phones between meetings. Two paragraphs are usually plenty. Don’t loop in random people just for “visibility,” and please, use Reply All sparingly.

Hot tip: Write the email body before you add the recipients. It is the single easiest way to prevent accidental half-written sends and catastrophic mistakes you will immediately regret.