Every week brings another announcement of a longer context window. GPT-5.5 Instant ships with 256K. Gemini goes to 2M. SubQ hits 12M. Meanwhile, the actual problem — that models don't reason better with more context — gets ignored because it doesn't fit the marketing narrative.

Context Windows Are a Red Herring and the Industry Is Losing the Plot

I'm going to say something that will make half the people in this space upset: the context window arms race is mostly theater.

Every week brings another headline about a longer context window. GPT-5.5 Instant ships with 256K tokens. Gemini hit 2M. SubQ just announced 12M. The narrative is always the same: more context means better reasoning, better understanding, better everything. And it's mostly wrong.

Here's the uncomfortable truth: models don't reason better with more context. They reason better with better reasoning architecture. And the industry knows this, but admitting it would crater the context window marketing narrative, so we keep pretending that a 10M token context window is meaningfully different from a 1M token context window for the vast majority of actual use cases.

What Actually Happens When You Give Models More Context

Let me be specific about what I mean. For tasks that require reasoning over a specific piece of information — analyze this document, answer this question about this code, find the relevant section in this contract — context windows matter. A 200K context window is genuinely better than a 32K window for those tasks.

But that's not what the arms race is about. The arms race is about claiming that models can maintain coherent understanding across millions of tokens, and that this capability translates to better performance on real tasks.

It doesn't. Here's why:

1. Attention degrades with distance. The further a piece of information is from the current position in the context, the less influence it has on the model's output. This is a fundamental property of transformer architecture, and it's not solved by making the context window longer. A model with a 10M token context doesn't necessarily pay attention to token #8M any better than a model with a 1M context pays attention to token #800K. The degradation curve is still there.

2. Models hallucinate more with more context. This is documented. When you give a model a long context and ask it to reason about something in the middle, it confabulates details from the beginning and end of the context. The longer the context, the more opportunity for this to happen. More context, more hallucinations, worse reasoning about the actual information you're trying to use.

3. The tasks that need millions of tokens are rare. When was the last time you actually needed to reason over 10M tokens? For most developers, even 200K is more than they ever use in practice. The 10M context claim is addressing a use case that almost nobody actually has, and using it as a marketing differentiator for a capability most users will never need.

What Would Actually Help

Here's what I wish the industry would focus on instead:

Better retrieval at lower context lengths. If you can retrieve the relevant 8K tokens from a 10M token context more accurately than a competitor retrieves from their 1M token context, you win on actual use cases regardless of how big your context window is. Retrieval is the hard problem. Longer contexts don't solve retrieval — they just make the retrieval problem bigger.

Reasoning architecture improvements. The model that reasons well from 32K tokens beats the model that reasons poorly from 1M tokens every time for the vast majority of practical tasks. The frontier labs know this — their best reasoning models are the ones with better reasoning training, not the ones with longer contexts.

Cost reduction on existing contexts. Making 200K context cheaper and faster does more for production AI than shipping a 10M context that nobody can afford to use at scale.

The Real Problem

The context window arms race exists because it's easy to market. "We have a bigger number than last month" is a simple story. "We improved retrieval accuracy by 12% on long-document tasks" is a nuanced claim that requires explaining benchmark methodology and doesn't fit on a tweet.

But the engineers building production AI systems know the truth: context length is the least of their problems. The problems are hallucination, retrieval accuracy, reasoning reliability, and cost per effective token. None of those are solved by making the context window bigger.

SubQ's 12M token context is interesting as an architecture research result. As a practical capability that changes how we build AI products, it's mostly irrelevant — at least until someone demonstrates that subquadratic attention actually enables better reasoning on real tasks, not just longer context windows.

Until then, the context window arms race is marketing dressed up as engineering. And I'm tired of pretending otherwise.

The context window race is a marketing problem disguised as an engineering problem. The models that will actually change how we build AI are the ones that reason better, not the ones with longer contexts.