
It's the Harness, Not the Model
What if the secret to trustworthy AI isn't smarter models, but simpler ones?
The Brief
This article argues that AI orchestration should live in an external harness rather than inside recursive model calls. It contrasts the opacity of models calling themselves as sub-agents with the traceability of harness-based orchestration, making the case that visibility and auditability outweigh token efficiency gains.
- What is an AI harness versus recursive model orchestration?
- A harness wraps around the model and handles tool access, approval gates, and context management externally, keeping every decision logged and traceable. Recursive orchestration has the model call itself as sub-agents, which is token-efficient but produces opaque decision chains where intermediate reasoning is compressed and lost.
- Why does the article favor harness-based AI architecture?
- Harness-based architecture produces logs for every decision, tool call, and data path. When a regulator asks how an AI reached a conclusion, harness logs provide a traceable answer. Recursive models compress intermediate reasoning at each level, making failures nearly impossible to diagnose in production systems.
- What is the tradeoff between token efficiency and visibility in AI systems?
- Recursive model decomposition uses fewer tokens than harness-orchestrated calls. However, the article argues that token costs are collapsing and context windows are expanding, while regulatory compliance costs and debugging difficulty are not decreasing. The tradeoff favors spending more tokens for greater visibility.
- Why does the article say observation requires distance?
- The article uses a mechanic analogy: you fix a car from outside the engine compartment, not from inside while it is running. Similarly, AI orchestration logic must live outside the model where engineers can see, log, and debug it, rather than inside opaque recursive calls.
I was reading an article the other day, one a podcaster I follow had been discussing.1 The thesis was elegant: language models should orchestrate their own reasoning by calling themselves as sub-agents, spawning child instances like digital matryoshka dolls.
Technically impressive. Benchmark numbers that make investors salivate.
And yet something nagged at me.
I kept thinking about my mechanic. He doesn't try to fix my car from inside the engine compartment while it's running. Observation requires distance. Control requires access.
When a recursive model fails on a complex task, you're left playing archaeological detective. It called itself three times, condensed context at each level, made decisions at junctions you can't see. Somewhere in that recursive stack, something went sideways. Where? You don't know. You can't know. The intermediate reasoning evaporated, compressed into whatever summary the model deemed relevant at the time.
This is fine for research papers. It's catastrophic for production systems where a regulator might ask "how did your AI reach this conclusion?" and you need an answer that isn't a shrug wrapped in technical jargon.2
Black boxes all the way down.
The Harness
What struck me was how simple the alternative is.
A harness wraps around the model. The model's job becomes almost monastic: take input, reason, produce output. Nothing more. Everything else, the tool access, the approval gates, the context management, lives in the harness. Outside the model. Where you can see it.
Where you can see it.
When recursion happens inside the model, you get elegance and opacity in equal measure. When it happens in the harness, you get logs. Every decision, every tool call, every data path, recorded and traceable.
The recursion advocates have a point about token efficiency. Recursive decomposition uses fewer tokens than harness-orchestrated calls. But token costs are collapsing. Context windows are expanding. Meanwhile, regulatory compliance doesn't get cheaper.3 Debugging production failures doesn't get easier.4
The tradeoff crystallizes: spend more tokens, gain visibility.
Observation Requires Distance
I think what's really happening is that the industry is exhausted by black boxes.
We've spent years dealing with models we can't interpret, decisions we can't explain, failures we can't diagnose. And the response from some quarters has been more black boxes? Recursive calls inside opaque models, complexity hidden behind complexity?
It's not the model's job to be recursive. Orchestration is a systems problem, not an intelligence problem. When we keep it in the harness, we keep it where we can explain it to the regulator, the stakeholder, the engineer staring at a failed job at 3 AM.
My mechanic knows this. You fix the car from the outside, where you can see what you're doing.
References
Footnotes
-
Gupta, A. (2026). "2025 Was Agents. 2026 Is Agent Harnesses." Medium ↩
-
IBM (2025). "Building Trustworthy AI Agents for Compliance." IBM Think ↩
-
Deloitte (2025). "Unlocking Exponential Value with AI Agent Orchestration." Deloitte Insights ↩
-
Liu, J., et al. (2025). "Large Language Model Guided Self-Debugging Code Generation." arXiv ↩
More to Explore

Mrinank Sharma, Please Come Back to Work!
He spent two years proving AI needs a contradictory voice. Then he quit to study poetry.

The Room You Chose
I told you to find your room. I didn't mention the cost of leaving the one you're already in.

Beware of Frankenstein!
Quit trying to save money with spare parts.
Browse the Archive
Explore all articles by date, filter by category, or search for specific topics.
Open Field Journal