
The first half of 2026 produced the strangest sustained developer revolt against a frontier model that the AI industry has seen. The complaint: Anthropic silently reduced the default reasoning effort on Claude Opus, between January and March, on the server side, without a changelog entry. By April, the data was public. By June, a meaningful slice of the Claude Code user base was actively pinning back to Opus 4.6 — the model that was current before the silent change — and refusing to move to 4.7 or 4.8 for the workflows that mattered. The story behind why is the story of how frontier AI models are actually maintained in 2026, and it's worse than the forum posts suggested.
What You Need to Know: Anthropic lowered the default reasoning effort on Claude Opus from "high" to "medium" via a server-side change in early 2026, documented by a 6,852-session independent study showing median thinking length dropped 73% (2,200 to 600 characters) and retries rose 80x. After Boris Cherny acknowledged the change on April 13, many developers pinned to Opus 4.6 (claude --model claude-opus-4-6) rather than adopt 4.7 or 4.8 for agentic workflows. As of June 2026, that workaround is still in production at multiple teams.
--effort=high) is real but invisible to the median user. Most production systems run defaults, not flags.Between January and March 2026, Claude's median thinking length — the character count of the model's internal reasoning trace per session — dropped from 2,200 characters to 600, a 73% reduction. This wasn't documented anywhere. It emerged from a 6,852-session independent study published on Scortier's Substack on April 12, 2026. The data was session-level: ID, thinking length, tool calls, retries, final output quality. Plot the curves and the trend is unambiguous.
The behavior changes that came with the reasoning reduction were observable. Plan-before-act behavior dropped from 71% of sessions in January to 19% in March. Median tool-call depth fell from 14 to 6. Retry rate on complex refactor and migration prompts rose up to 80x because the model's first answer now ships half-baked. Anush Elangovan, a senior director at AMD, posted a viral X thread on Sunday, April 12: "Claude can't be trusted anymore. Tasks that worked in January now silently fail. Retries are up 80x on some of my pipelines. I switched three of my automations back to GPT. This is not the model I onboarded my team to." The thread hit a million views in 36 hours. (Source)
Claude Code product lead Boris Cherny posted a measured thread on X on April 13, one day after the Scortier data went public. He didn't deny the complaint. He confirmed it, and he tried to reframe it.
The key quote: "We lowered the default reasoning effort to 'medium' earlier this year after internal benchmarks showed the old default was overkill for the majority of workloads. Users on paid plans can opt into 'high' via the --effort=high flag or the effort parameter in the SDK. We are evaluating whether 'medium' is the right default."
The concession is real — Cherny confirmed the change, didn't call it a "perception" issue, and said the team is "evaluating." The deflection is the framing: it puts the burden on the developer to know a flag exists that was never announced. As a 50-task before/after test from The Planet Tools showed, the April default with --effort=high was within 5% of the January default — so the "old Claude" is still in there, locked behind a flag most users will never know about. (Source)
In response, a meaningful slice of the Claude Code user base — particularly the teams running autonomous agentic workflows — began pinning to Opus 4.6, the model snapshot that was current before the silent change. The pin is a one-liner: claude --model claude-opus-4-6. The API equivalent is the anthropic-version header with the specific model ID. Cursor and other front-ends have model selectors that surface older snapshots.
The decision to pin is not free. Pinning solves the regression problem but locks you out of improvements. Opus 4.7 (April 16) shipped with software engineering benchmark gains of ~10% and visual reasoning of ~13%, but it was also the model that introduced the "first model that punishes bad prompting" complaint — sharper refusals, more aggressive judgment, less helpful on ambiguous agentic tasks. Opus 4.8 (May 28) shipped with a 4x reduction in silent-flaw-passing and dynamic workflows, but the same mid-default reasoning model — meaning the new "honesty" claim is partially built on top of the same cost-saving default that caused the original complaint.
The pattern: pin to 4.6 for project-based workflows where you know the prompts, use 4.8 for new code where the dynamic workflows and honesty improvements matter, and use 4.7 for almost nothing. That's the production advice circulating on Claude AI community forums and in Discord servers as of June 2026.
A model "nerf" implies intent. The evidence suggests the change was a deliberate cost optimization, justified internally as "old default was overkill." But the structural problem is more interesting than the motive.
Anthropic's API versioning system lets developers pin to specific model snapshots, but the behavior of a given version can still shift if Anthropic applies what they call "minor safety or quality updates" — a category with no public changelog. Pin to claude-opus-4-6 and you get Opus 4.6 as of the time you pinned it, not Opus 4.6 with whatever the inference cluster happens to be running today. That's the only safety guarantee in a world where the model is the API and the API is a moving target.
The deeper issue: the "default" was a quiet, server-side change that affected everyone. The 99% of users who never read changelogs and never set flags got a different product than the one they onboarded their team to. That's the structural problem that no API versioning fix can address. The right fix is to make default changes loud — public, opt-in, with a clear migration path — but that's a product decision, not a technical one, and the financial incentive to keep defaults cheap is permanent.
The 4.6 pin is the right move if you're running a production agentic workflow that you tuned against the pre-March behavior. It's also a tacit admission that you don't trust your model vendor to maintain quality over time. Both of those are true at the same time, and the contradiction is the real story of 2026 for anyone shipping AI products.
For builders, the immediate implications:
Run your own evals on a fixed schedule. Not vendor benchmarks, not public leaderboards — your own evals against your own production workloads, with frozen prompts and frozen model versions. The Scortier pattern (6,852 sessions, plotted over time) is reproducible by any team that logs model interactions. You want the curve, not the snapshot. The teams that don't have this data are the ones that will be surprised by a default change.
Pin to specific snapshots, not "the latest." Treat model versions like database versions. Pin a snapshot, run your eval suite against the next version before upgrading, and have a rollback path. This is boring infrastructure work, and it's the difference between a team that gets to take the new model improvements and a team that gets silently downgraded and finds out from customer tickets.
Diversify model providers for critical paths. A frontier-model industry problem is also a frontier-model industry risk. If your entire agentic stack runs on one provider's defaults, you're one quiet server-side change away from a quality crisis. The OpenAI / Anthropic / Google mix isn't just about cost optimization — it's about not having a single point of failure for a product behavior you can't see.
Treat default reasoning effort as a product feature you don't control. If your users depend on a particular level of model reasoning, the right place to enforce that is in your own orchestration layer, not in the model vendor's default. Run the model at the effort level you want, every time, and don't assume the vendor's default matches your users' expectations.
The longer-term story is that the labs are going to keep optimizing for cost, and the only durable defense is independent measurement. The teams that treat model behavior as a continuous monitoring problem — not a "ship the latest version" problem — are the ones that will still be in production in 2027.
Anthropic silently lowered Claude Opus's default reasoning effort from "high" to "medium" in early 2026, dropping median thinking length 73% (2,200 to 600 characters) on real workloads. After Boris Cherny acknowledged the change in April, a meaningful slice of Claude Code developers pinned to Opus 4.6 (claude --model claude-opus-4-6) rather than adopt 4.7 or 4.8 for agentic workflows, with the 4.8 honesty improvements layered on top of the same cost-saving default.