← Back to Payloads
2026-07-02

Claude Sonnet 5's 1M Context Window Is a Trap, and the New Tokenizer Is Why

Anthropic bragged about a 1M-token context. What they didn't tell you is the new tokenizer inflates token counts ~30%, so the window holds roughly 23% less text than five 200K Sonnet 4.6 windows would have — and three API fields now return hard 400s.
Quick Access
Install command
$ mrt install llm
Browse related skills
Claude Sonnet 5's 1M Context Window Is a Trap, and the New Tokenizer Is Why

Claude Sonnet 5's 1M Context Window Is a Trap, and the New Tokenizer Is Why

Anthropic shipped Claude Sonnet 5 on June 30, 2026, and the launch blog led with the 1M-token context window. Five times bigger than Sonnet 4.6. Headlines everywhere. Migration guides everywhere. Everyone flipped the model string from claude-sonnet-4-6 to claude-sonnet-5 and called it a day.

That migration breaks in three ways. Two of them return HTTP 400.

The window isn't actually 5× bigger

Sonnet 5 ships with a new tokenizer. Same input text now produces approximately 30% more tokens than on Sonnet 4.6. Anthropic disclosed this on the What's New in Sonnet 5 page, which is more transparency than most labs give, but the framing — "the same input text produces approximately 30% more tokens" — undersells how this compounds with the context window pitch.

Do the arithmetic. Sonnet 4.6 had a 200K-token window. Sonnet 5 has a 1M-token window. 1M / 200K = 5× on the spec sheet. But each Sonnet 5 token covers less text. If you measure text content (characters, words, source lines), the Sonnet 5 window holds roughly 770K tokens of Sonnet-4.6-equivalent text. That's 3.85× more text than Sonnet 4.6, not 5×.

Or, more usefully: a Sonnet-5 1M-token window holds 23% less text than five Sonnet-4.6 200K windows would have. If you're migrating from a Sonnet 4.6 multi-call compaction pipeline, you don't get to drop compaction the way the headline suggests. You get to compact less aggressively.

This is fine if you're a marketer. It's not fine if you're sizing context budgets for a long-context agent.

Three fields that now return 400

The migration guide lists three breaking changes. Two are silent. One is loud.

1. Sampling parameters — setting temperature, top_p, or top_k to anything other than the default returns a 400 Bad Request. Same constraint Anthropic shipped on Opus 4.7 earlier this year. If your client library sets temperature: 0.7 because that's what the docs said to do in 2024, every request dies. No deprecation warning. No header. Hard error.

2. Manual extended thinkingthinking: {type: "enabled", budget_tokens: N} was deprecated on Sonnet 4.6. On Sonnet 5 it's removed and also returns 400. The replacement is thinking: {type: "adaptive"} plus the effort parameter.

3. Adaptive thinking is on by default — this is the silent one. Sonnet 4.6 with no thinking field ran without thinking. Sonnet 5 with no thinking field runs with adaptive thinking. The total output budget (max_tokens) is now a hard ceiling on thinking + response text combined, not response text alone. If you tuned max_tokens against Sonnet 4.6 output lengths, you will truncate responses on Sonnet 5. Some of them badly.

The combined effect: a naïve model = "claude-sonnet-5" swap can produce a flood of 400s on Day 1 and quietly truncated outputs on Day 2.

Pricing math after the tokenizer

List pricing is unchanged from Sonnet 4.6: $3 per million input tokens, $15 per million output tokens at standard rates, with the $2 / $10 introductory pricing running through August 31, 2026. The 30% tokenizer inflation lands directly on the invoice.

A workload that billed 100M input tokens on Sonnet 4.6 will bill ~130M on Sonnet 5 at the same per-token rate. Real per-document cost rises ~30%, not 0%. Same output-side hit. Multiply your projected August bill by 1.3 before signing off on the migration.

The intro pricing partially absorbs this — but it expires. Plan for $3.90 effective input cost per equivalent Sonnet-4.6-million-tokens after August 31.

Where Sonnet 5 actually wins

The agentic benchmark numbers are real. Terminal-Bench 2.1: 80.4% beats Opus 4.8's 74.6% on the same harness. GDPval-AA v2: 1,618 vs Opus 4.8's 1,615 — Sonnet 5 wins on professional knowledge-work tasks, not just coding. HLE with tools: 57.4%, basically tied with Opus 4.8 at 57.9%. CursorBench at the IDE level: 57% vs Sonnet 4.6's 49%, the largest jump Cursor has reported between adjacent Sonnet releases.

If you're deploying for coding agents, terminal work, computer use, or knowledge-work pipelines, Sonnet 5 is the right model. The launch post's framing — "the most agentic Sonnet yet" — holds up. The capability story is the easy half.

The migration checklist

Before flipping the model string:

1. Remove every non-default sampling parameter. Find every temperature, top_p, top_k set in client code. Default-or-die. 2. Replace manual extended thinking with thinking: {type: "adaptive"} plus the effort parameter. Audit every agent loop that touches the thinking field. 3. Recount tokens against the new tokenizer. The usage field lies to anything calibrated on Sonnet 4.6. Use Anthropic's token-counting endpoint, not cached numbers. 4. **Raise max_tokens on long-output agents.** Adaptive thinking eats into the output budget. If your agent emits ~4K tokens of reasoning before the final answer, your max_tokens: 8192 is now wrong. 5. Recalibrate context-window planning. Stop thinking "1M tokens, 5× bigger." Think "~770K tokens of Sonnet-4.6-equivalent text, 3.85× bigger." Adjust compaction triggers accordingly.

The 1M context window is real. So is the tokenizer. So are the 400s. Anthropic shipped a great model wrapped in three migration footguns and a pricing change that isn't actually a pricing change. Read the What's New page before you ship.

Related Dispatches