← Back to Payloads
Opinion

The Next 12 Months Are Not About Better Models. They're About Better Harnesses.

The frontier-model improvement curve has bent. The product wins of 2025-2026 were won by the harness, not the weights, and the founders still obsessing over GPT-6 are about to lose to founders obsessing over the agent loop.
Quick Access
Install command
$ mrt install opinion
Browse related skills

The Next 12 Months Are Not About Better Models. They're About Better Harnesses.

GPT-5.5 is a marginal improvement over GPT-5. Claude Opus 4.8 is a marginal improvement over 4.5. Gemini 2.5 Pro is a marginal improvement over 2.0. The frontier-model improvement curve has **bent** — not stopped, but bent — and the founders still obsessing over the next model release are going to lose, in 2026, to founders who have figured out what to wrap around the model they already have. The next twelve months in AI belong to the harness layer, not the model layer.

The Model Has Stopped Being the Product

I have used every frontier model released in the last nine months. I have also used mid-tier and open-weights models wrapped in production-grade harnesses. The honest report: the gap between the best model and the second-best is now smaller than the gap between a well-harnessed mid-tier model and a poorly-harnessed frontier one. That is the entire market signal in one sentence.

The products that crossed nine-figure ARR in 2025 and 2026 — **Cursor, Claude Code, Devin, v0, Bolt, Replit Agent, Lindy, Glean** — won on the agent loop, the tool surface, the planning layer, the context management, the error recovery, the memory, the permissioning, the eval harness, and the deployment story. All harness work. None of it lives in the model weights.

Three Pieces of Technical Evidence

**One: Agent S3's Behavior Best-of-N.** The 72.6% OSWorld result from Simular last December came from a *harness* innovation — running eight rollouts in parallel and selecting the best via fact extraction — not a model improvement. The same Claude or Gemini inside an older harness would have hit 62%. The ten-point swing came from the loop, not the weights. ([arXiv:2510.02250](https://arxiv.org/abs/2510.02250))

**Two: vLLM Semantic Router v0.3 Themis.** Routing requests to the right model — a cheap one for 80% of traffic, an expensive one for the 20% that actually needs it — is a harness concern. Themis demonstrated a 3–5x cost reduction purely at the orchestration layer, with no retraining. That is the cost-curve story nobody in the press is covering.

**Three: MCP, browser-use CDP, Aider's repo map, Anthropic's "managed agents," Deerflow 2's super-harness, Microsoft Build 2026's declaration of independence from OpenAI.** Every one is a harness story. Anthropic is not selling a better Claude — they are selling a better way to *run* Claude in production. Microsoft did not ship a frontier model at Build 2026; they shipped a routing and orchestration story. The frontier is being absorbed by the harness layer.

The Steelmanned Counterargument

The honest counterargument: a true architectural shift — a 10x jump in test-time compute, a new training paradigm, a real omnipresent model — could re-level the field in favor of whoever has the best raw model. That is what reasoning models did in late 2024. It could happen again.

It probably will not, on the timescale that matters for 2026 product decisions. The runway on harness wins is 12–18 months minimum. The first thing built around any new architecture will be its harness, by definition. The lead time between a new model release and a new harness company reaching product-market fit is now around six months, not eighteen. The harness is the moat, not the model.

The Take

If you are building in 2026, build the harness. The bet on a new model breakthrough is a venture-scale bet on something the open-weights community will replicate in 90 days anyway. The bet on a better loop, a better eval, a better tool surface, a better permissioning model — that bet compounds.

**Prediction for 2026:** a pure-harness company with no proprietary model — a Cursor, a LangChain, a Browser-Use, an n8n-for-agents — will be acquired for north of $1 billion before mid-2027. Cursor's rumored $30B valuation is the leading indicator, not the peak. **The model is the commodity. The harness is the company.**

*Sources: Simular [Agent S3 paper](https://arxiv.org/abs/2510.02250) (arXiv:2510.02250), vLLM Semantic Router v0.3 Themis release notes, Microsoft Build 2026 announcements, Anthropic managed-agents documentation.*

— *Mr. Technology*