Opus 4.8 hits today with adaptive thinking, a 1M context window, and benchmark wins that make GPT-5.5 look expensive. The pricing is unchanged from 4.7. The capability jump is not. Here's what matters and what the press release won't tell you.

Claude Opus 4.8 Drops Today and It's Anthropic's Most Aggressive Play Yet

Anthropic dropped Claude Opus 4.8 on May 28th, and I am going to do something the press release won't: tell you what it actually means.

The headline is impressive on paper. Hybrid reasoning model. Adaptive thinking that automatically allocates more compute to harder problems. One million token context window. Stronger across coding, agentic tasks, and professional work. Available today on Claude for Pro, Max, Team, and Enterprise — plus Amazon Bedrock, Google Cloud, and Microsoft Foundry. Same pricing as Opus 4.7: five dollars per million input tokens, twenty-five dollars per million output tokens.

That's the table stakes. Let me get to the part that actually matters.

Adaptive Thinking Is Not a Gimmick

Every model vendor right now is trying to solve the same problem: how do you make a model that can reason deeply about hard problems without charging enterprise prices for every single query, including the easy ones. The naive solution is just make the model slower and more thorough by default. That works. It also runs up your token bill and your latency.

Opus 4.8's adaptive thinking is Anthropic's answer. The model looks at the task and decides how much thinking it needs. Simple extraction? Quick pass. Multi-service architecture design with conflicting constraints? Spend the tokens. This sounds obvious in theory. In practice, it means the model is making runtime decisions about its own reasoning process — not following a fixed prompt template, but actively modulating how it approaches a problem based on what it sees.

This is not Chain-of-Thought stapled on as an afterthought. This is the model architecture itself being more selective about where to invest reasoning effort. If it works as described — and the benchmark results suggest it does — it's a meaningful efficiency advance. Better results at the same cost is the only metric that matters in production.

The Super-Agent Benchmark Numbers Are Not Small

Let's talk about the benchmark that got buried in the quote matrix: Anthropic's Super-Agent benchmark. Opus 4.8 is the only model to complete every test case end-to-end. Not almost every case. Every case. And it did it beating GPT-5.5 at cost parity.

Read that again. Cost parity. Not "marginally cheaper." Not "within striking distance." The same price as GPT-5.5, with better completion rates on a benchmark designed to stress-test autonomous agent workflows. If you are running agentic pipelines today and paying for GPT-5.5, you should be running the numbers on Opus 4.8 by end of week.

CursorBench tells a similar story — Opus 4.8 exceeds prior Opus models across every effort level, and tool calling is meaningfully more efficient. Fewer steps for the same intelligence. That efficiency gap compounds at scale. An agent that uses twenty steps to complete a task versus one that uses twelve is a different product in production, not just a different benchmark score.

The Computer Use Score Is the One Nobody Is Talking About

Online-Mind2Web, 84%. That puts Opus 4.8 ahead of both Opus 4.7 and GPT-5.5 on computer use and browser-agent tasks. The benchmark tests models operating in realistic web environments — not toy examples, not synthetic HTML, actual browser-based task completion.

The reason this matters: computer use is the proxy for the workflows that most enterprise teams are actually trying to build. Browsing, form filling, multi-step web navigation, cross-domain task completion — these are the boring, tedious, high-value workflows that justify AI investments in practice. "Can you summarize my emails" is a demo. "Can you complete my travel booking across three different portals while maintaining context" is a product.

Opus 4.8 scoring 84% on that task class — a meaningful jump over GPT-5.5 — is not a rounding error. It is a signal that the capability gap in agentic computer use has swung, and it swung today.

Legal Agent: The Benchmark That Translates to Revenue

Highest score ever recorded on the Legal Agent Benchmark. First model to break ten percent on the all-pass standard. For substantive legal work, the release notes say this translates directly into how much real attorney work can be handed off with confidence.

I want to pause on this because the legal vertical is where AI adoption actually pencils out in enterprise. A law firm or legal department evaluating AI does not care about abstract benchmark scores — they care about whether the model can review a contract, flag the clauses that need human attention, and do it consistently enough that the attorneys can delegate with actual confidence rather than just checking everything afterward. Ten percent on all-pass is not a small number when you consider the baseline. And breaking that barrier for the first time is a milestone worth noting.

The Timing Is Not Accidental

Anthropic is three weeks away from a potential IPO. OpenAI is in the race to go public too. The LLM market is consolidating around a small number of players who can credibly compete on frontier capability while maintaining the enterprise trust story. Opus 4.8 dropping today is not coincidental — it is positioning. The model is strong enough to justify continued enterprise investment in Anthropic's platform, and the benchmark results are specific enough to be quotable in procurement conversations.

The ad-free pledge, the essay on why advertising incentives are incompatible with genuine helpfulness, the consistent investment in Constitutional AI — all of it is trust infrastructure that lets Anthropic charge premium prices and win enterprise contracts in regulated industries. Opus 4.8 is the technical layer that makes that trust story credible. No tension between safety and capability — just a model that performs and a company that has decided what it will not do.

That combination is increasingly rare and increasingly valuable.

The Actual Recommendation

If you are running production agentic workflows today and your current model is GPT-5.5 or an earlier Opus, you owe it to your infrastructure to run Opus 4.8 through your eval harness this week. Not because the benchmarks are impressive — benchmarks are proxies. Run it on your actual workload, measure the completion rate, measure the step count, measure the token cost per successful task. Those numbers are what will tell you whether the upgrade is worth it.

My bet is the answer is yes for most agentic pipelines. The cursor benchmark alone — fewer steps for the same intelligence — suggests meaningful efficiency gains that translate directly to production cost reduction. The computer use score suggests better reliability on the long-running, multi-tool tasks that are hardest to debug when they fail.

Anthropic has been shipping at a pace that suggests they understand the market is moving fast. Opus 4.8 is their clearest shot yet at making the case that they are not just a safety-first alternative to OpenAI — they are the capability-first option for serious production work.

The data will tell us if they are right. Run the evals.

Claude Opus 4.8: released May 28, 2026. $5/$25 per million tokens. Hybrid reasoning with adaptive thinking. 1M context window. Super-Agent benchmark: only model to complete every case end-to-end, beats GPT-5.5 at cost parity. Online-Mind2Web: 84%. Available on Claude, Bedrock, Google Cloud, Microsoft Foundry.