
Hey guys, Mr. Technology here.
On June 30, 2026, Anthropic shipped Claude Sonnet 5. No keynote, no livestream, no Mythos-class naming stunt. Sonnet 5 is the first Sonnet genuinely worth deploying for agentic production workloads. Not a "small Opus." A new tier in everything but name.
The headline numbers:
That last line is the one nobody is talking about. For the first time, a Sonnet-class model is not strictly dominated by Opus on any published evaluation Anthropic ships.
Pricing is $2 / $10 per MTok through August 31, 2026, then $3 / $15 per MTok at standard. Opus 4.8 is $5 / $25. Even after the step-up, Sonnet 5 is roughly 40% cheaper than Opus 4.8 at list.
But the new tokenizer inflates token counts by 1.28x–1.42x for English and code. List price flat; effective price up 28–42%. Multiply by 1.3 and re-run before August 31.
| Document | Sonnet 4.6 tokens | Sonnet 5 tokens | Ratio |
|---|---|---|---|
| UDHR (English) | 2,356 | 3,341 | 1.42x |
| UDHR (Spanish) | 3,572 | 4,747 | 1.33x |
| UDHR (Mandarin) | 3,334 | 3,360 | 1.01x |
| sqlite-utils/db.py | 44,014 | 56,113 | 1.27x |
Mandarin at 1.01x is the tell: a tokenizer change that hits English and code harder than CJK. BPE merges optimized for the dominant pretraining distribution. Not a bug. A choice.
Anthropic exposes four effort levels on Sonnet 5: low, medium, high, xhigh. Higher effort spends more tokens on reasoning, raising both quality and cost. **At xhigh, Sonnet 5 can cost more than Opus 4.8** for comparable quality.
Sonnet 5 is a single SKU covering a 3x cost band and a 20+ point capability band. Routing decides where it sits.
def select_model(task, accuracy_critical=False):
if task.is_high_volume_latency_sensitive:
return "claude-haiku-4-5" # $0.80/$4 per MTok
if accuracy_critical:
return "claude-opus-4-8" # $5/$25 per MTok
return "claude-sonnet-5" # $3/$15 (or $2/$10) — default**Sonnet 5 absorbs both the Sonnet 4.6 use case and a chunk of the Opus 4.8 use case at medium–high effort.** Opus is no longer the default. It is the exception.
Multi-step debugging without prompting. Replit engineers described Sonnet 5 investigating a bug, writing a reproducing test, implementing the fix, then stashing it to confirm the bug came back without the change. Unprompted. The model is maintaining a hypothesis and verifying it.
Brownfield reliability. Cursor reported Sonnet 5 traced failures to root causes on messy legacy code rather than patching symptoms.
End-to-end business workflows. Zapier gave it a two-part job — update Salesforce tiers, send a launch email — and it finished without stalling halfway. Sonnet 5 finishes the chain.
Computer-use agents. Pace runs insurance workflows (submission intake, FNOL, loss runs) on production systems; Sonnet 5 is "consistently taking the right action." In 2024 the same workloads required Opus.
This is the part Anthropic flagged in the system card — and the part that explains why this model could ship publicly at all after the Fable 5 / Mythos 5 export-control detonation of June 12.
"Sonnet 5 is significantly less capable at cyber tasks than Mythos 5: its safeguards are thus similar to those we apply to Opus 4.7 and Opus 4.8."
On the Firefox exploit evaluation developed with Mozilla, Sonnet 5 was unable to develop a full working exploit (0.0%), while Mythos 5 and Opus 4.8 did. The model is not as cyber-capable as Opus 4.8 by design. Anthropic shipped a model that is good enough at agents but not good enough at cyber to trigger the same export-control logic that took Fable 5 offline.
The tradeoff: Sonnet 5 has a higher rate of misaligned behavior than Opus 4.8 or Mythos Preview on the automated behavioral audit. Lower than Sonnet 4.6, higher than the top tier. Accept it or don't ship it.
This one is going to break things:
*"Sampling parameterstemperature,top_p,top_kare no longer supported."*
The model is now non-deterministic by default, and you cannot dial it back. If your agent harness logs and replays tool calls for reproducibility, your replay path is now a soft contract. Set "thinking": {"type": "disabled"} for the deterministic path and accept higher variance elsewhere. Architect for non-determinism. It is no longer optional.
Read 1: Anthropic is fighting the export-control regime with product segmentation. Fable 5 lasted 72 hours. Mythos 5 is restricted. Sonnet 5 is the model Anthropic can ship to every Free and Pro user in every country — nearly as capable on the workloads that matter. The flagship product is the one that is publicly available everywhere.
Read 2: The mid-tier is now the production tier. Sonnet 5 at medium undercuts Opus 4.8 by ~60% on list while delivering 91% of the SWE-bench Pro score. "Most of Opus for a third of the price" is the right tradeoff for most agents.
Read 3: Effort levels are the new moat. Sonnet 5 at low competes with Haiku 4.5; at xhigh it competes with Opus 4.8. One SKU, 6x cost band. Every other frontier lab has to match the pattern or lose the routing economics.
medium is the new high. xhigh is the new Opus-call. Stop routing by model name; route by effort.temperature is gone. Your replay path is soft. Build the eval harness accordingly.Sonnet 5 is not a flashy release. No Mythos-class naming, no 95% SWE-Bench Verified headline, no 72-hour shutdown drama. It is, however, the model 90% of production agents should now run on — shipped yesterday at the lowest list price Anthropic has offered on a frontier-tier SKU. The mid-tier is the new frontier. Sonnet 5 is the proof.
— Mr. Technology
*Released: June 30, 2026. Model: claude-sonnet-5. Pricing: $2/$10 per MTok introductory through August 31, 2026, then $3/$15 standard. Context: 1,000,000 input / 128,000 output. Adaptive thinking: on by default. Sampling parameters: temperature, top_p, top_k deprecated. Effort levels: low, medium, high, xhigh. Tokenizer: updated (same as Opus 4.7), 1.27x–1.42x inflation on English/code. Benchmarks: SWE-bench Pro 63.2%, Terminal-Bench 2.1 80.4%, OSWorld-Verified 81.2%, HLE-with-tools 57.4%, GDPval-AA v2 1,618 (Opus 4.8: 1,615). Cyber capability: 0.0% full-exploit success on Firefox 147 eval. Sources: Anthropic, Sonnet 5 System Card, Simon Willison, MarkTechPost, AWS, Thurrott.*