← Back to Payloads
Opinion2026-06-02· 3 min read

Context Windows Are a Dead End, and You're All Counting the Wrong Number

Every frontier lab is racing to announce the biggest context window they can. 200K, 500K, 1M, 2M tokens. The number on the marketing slide is the metric that matters least. Here is why the long-context arms race is a distraction from the engineering work that actually moves production AI forward.
Quick Access
Install command
$ mrt install opinion
Browse related skills
Context Windows Are a Dead End, and You're All Counting the Wrong Number

The single most useless metric in AI right now is context window size. Every vendor is racing to announce bigger numbers — 200K, 500K, 1M, 2M tokens — like they're competing for a medal in a sport nobody should be playing. And you, the buyers, are buying it. That's the part that drives me crazy.

Here's my take: a 1M-token context window is a marketing gimmick dressed up as a technical achievement. The industry is obsessing over the wrong number, and the people who actually have to ship AI products are going to pay for it.

Let me explain.

**Bigger context doesn't mean better understanding. It means worse signal-to-noise.** When you stuff a 500K-token document into a model's context, you haven't given it more information — you've given it more noise. Attention is not magic. The model has to allocate its attention across everything you handed it, and the more you hand it, the more diluted that attention becomes. Long-context models fail embarrassingly at simple reasoning tasks over their full context. They confabulate. They lose the thread. They forget what was said 200K tokens ago. The needle-in-a-haystack benchmarks vendors parade around are easier than the work you actually want the model to do. Real workloads aren't "find the magic number in paragraph 47." Real workloads are complex, multi-document reasoning with dependencies that span the entire input. The benchmarks hide this on purpose.

**The cost curve is brutal and the economics get worse, not better.** Transformer attention scales quadratically with sequence length. A 2x increase in context length isn't 2x more expensive — it's 4x. A 10x increase is 100x. Vendors don't show you this math because it would end the conversation. When a vendor proudly announces 1M tokens of context at 90% accuracy on a benchmark, ask what it costs per inference. Then ask what it costs at 100K tokens. The 10x context window isn't 10x more useful — it's somewhere between 50x and 200x more expensive for marginal quality gains on actual workloads. The unit economics are broken and nobody wants to talk about it.

**Most of the training didn't happen at this length anyway.** Look at the training distribution of any major long-context model. The vast majority of pretraining was done at 8K, 16K, maybe 32K tokens. The long-context capability is bolted on with continued training and position interpolation hacks. The model is extrapolating to contexts it never really learned. Sometimes that extrapolation works. Often it doesn't. You're shipping production workloads on a model's best guess.

**RAG and structured retrieval beat raw context length every single time.** Here's what I keep telling the teams I work with: if you think you need a million tokens of context, you don't have a context window problem — you have a search problem. A well-built RAG pipeline with proper chunking, embedding, and reranking will outperform a 1M-token context on essentially any real workload. It's faster, cheaper, more reliable, and you can actually debug it. The "just stuff it all in the context" approach is the sign of an engineering team that hasn't thought hard about the problem yet.

**The counterargument, briefly, then dismissed.** Yes, long context is useful for some things. Large codebases. Whole-book analysis. Long conversation histories. I'm not arguing for zero context. I'm arguing for honest context. A 64K or 128K context window is plenty for nearly every production workload paired with the right retrieval and prompting. Pretending you need 1M tokens is a fantasy sold by vendors who need a number for the slide. The real differentiation is going to come from inference quality, latency, tool use, and cost per task — not from who can stick the biggest number on a benchmark.

**The take.** Stop counting tokens. Start counting whether the model actually understood your problem. The vendor with the 1M-token context window is not your friend — they're selling you a metric that flatters their roadmap and punishes your bill. The teams shipping real AI products in 2026 are the ones that figured this out early: context is a budget, not a feature.