
On April 7, 2026, Anthropic published a 244-page system card for a model called Claude Mythos Preview, showed the world the benchmarks, and said you could not have it. The model could find zero-day vulnerabilities in every major operating system and every major web browser. It found a 27-year-old bug in OpenBSD. It saturated Cybench at 100% and USAMO at 97.6%. The accompanying announcement described it in the same breath as the best-aligned model Anthropic had ever built and the model that likely posed the greatest alignment-related risk of any model the lab had produced. Anthropic shipped it to Project Glasswing — a coalition with Apple, Microsoft, Google, AWS, CrowdStrike, and the Linux Foundation, backed by $100 million in usage credits — and pointed the capability exclusively at defensive cybersecurity work. (Anthropic Mythos Preview, April 7, 2026)
On June 9, 2026, Anthropic put that model on the public API.
The release is called Claude Fable 5. It is the same underlying weights as Mythos Preview. The 319-page system card — up from 244 — is mostly the same model, with one new thing sitting in front of it: a classifier that routes fewer than 5% of sessions to the safer Claude Opus 4.8. The other 95% of the time, the thing on the other side of the API is the model Anthropic spent April telling us was too dangerous to ship. Anyone with a Pro subscription can use it today. The 5% of requests the classifier catches — cybersecurity, CBRN, certain autonomous-targeting patterns — fall through to Opus 4.8, which is a more conservative model. (Anthropic Fable 5 / Mythos 5 announcement, June 9, 2026)
The press is going to frame this as "AI lab ships a bigger model." That framing is wrong. The story is the classifier. The story is the IPO. The story is that every agent team just got a days-long autonomous model in their API at half the price Anthropic quoted for the preview. The story is that the "too dangerous" line moved in eight weeks, and the thing that moved it was a routing layer and a regulatory story for an S-1, not a fundamental safety breakthrough. If you are building production agents, the build-vs-buy math for the next twelve months just changed in a way most teams will not notice until they are already behind.
Let me get the benchmarks out of the way first, because they are the easy story and they are real.
SWE-Bench Pro: 80.3% for Fable 5, versus 69.2% for Opus 4.8 and 58.6% for GPT-5.5. This is the agent-coding benchmark, the one that measures whether a model can solve a real GitHub issue end to end. Fable 5 is the first model to break 80% on it. (Anthropic Fable 5)
Terminal-Bench 2.1: 66.0% for the prior-generation M-class comparison — Fable 5's published number on Terminal-Bench is the highest in the field by a wide margin.
Hex core analytics benchmark: 90%+, the first model to clear 90, ten points above the previous Opus generation.
Hebbia Finance Benchmark for senior-level reasoning: highest score of any model, with substantial gains in document-based reasoning, chart and table interpretation, and problem solving. IMC reported Fable 5 aced their trading-analysis evaluations nearly across the board — factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis.
Cognition's FrontierCode evaluation: highest among frontier models at medium effort, which is the most relevant data point for production coding agents because it tests whether the model can pass difficult tasks while meeting the standards of high-quality production codebases — not just whether it can pass at all.
Vision: state-of-the-art. The Pokémon FireRed demo is the one the marketing team is going to lead with — Fable 5 beat the game with a minimal, vision-only harness, no maps, no navigation aids, no extra game-state information. The interesting data point is that the model is doing this from raw game screenshots. It can extract precise numbers from detailed scientific figures. Stripe reported it can rebuild a web app's source code from screenshots alone.
Software engineering at scale: the most important real-world data point. During early testing, Stripe ran Fable 5 against a 50-million-line Ruby codebase. The model compressed a codebase-wide migration that would have taken a whole team over two months by hand into a day. That is the agent-economy headline: a frontier model that can do in a day what previously required a team, on a real production codebase, with a real migration. (Anthropic Fable 5 announcement)
Memory and long-context: three times the improvement of Opus 4.8 on file-based memory. In the Slay the Spire deck-building test, persistent file-based memory improved Fable 5's performance by three times more than it improved Opus 4.8, and the model reached the game's final act three times as often. Long-horizon autonomous work is the workload that benefits most from persistent memory, and the gap is the largest on the long-horizon workloads.
Drug design: ten times the prior pace. Using Mythos 5 (the partner-only tier, same weights, classifier lifted in some areas), Anthropic's internal protein design experts accelerated aspects of the drug design process by around ten times. In one example, Mythos 5 — with protein design and bioinformatics tools but no human assistance — matched or beat skilled human operators on every task normally done by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures. Nine of the fourteen protein targets in the study yielded strong candidates now under investigation. One Mythos hypothesis, a novel mechanism for an E. coli protein, was independently corroborated in a paper on biorxiv.
Genomics: a 100-times-smaller model that beat a Science paper. Mythos 5 conducted novel genomics research across more than a week of largely autonomous work, assembled single-cell data for millions of cells across 138 animal species, designed and trained a custom machine learning model, and outperformed a recent model published in Science — with a model 100 times smaller. Anthropic intends to publish the results.
The benchmarks are not the story. The benchmarks are the part of the story the press can summarize in a sentence. The story is the architecture, the classifier, the IPO, and the agent stack.
I want to be precise about what Anthropic shipped, because most coverage is going to get this wrong in one of two ways. They will say "Anthropic made Mythos safe" and miss the engineering. Or they will say "Anthropic put a thin wrapper on Mythos and called it safe" and miss the architecture. The truth is somewhere specific and worth understanding.
Fable 5 is the Mythos model with a topic-routing classifier sitting in front of it. The classifier inspects the inbound query and the conversational context, decides whether the request falls into a category that Anthropic has flagged as high-risk, and — only if it does — routes the request to Claude Opus 4.8 instead. For everything else, the user gets Fable 5. Anthropic says the classifier triggers, on average, in less than 5% of sessions, and that they tuned it conservatively on purpose because releasing a model this capable carries real risk. (Anthropic Fable 5)
This is a meaningful piece of engineering. A topic-routing classifier that has to be both safe enough to ship behind a 319-page system card and permissive enough to trigger fewer than 5% of the time is a real research and production problem. Anthropic says false positives do happen — harmless requests sometimes get rerouted — and that they are working to improve the classifier as more capable models arrive. The honest read is that the classifier is a meaningful safety architecture for a public API, that it is also a regulatory story for the S-1 Anthropic filed on June 1, and that the engineering behind it is real even if the S-1 angle is also real.
The reason this matters for production agents is that the classifier is now a public-API surface. If you build an agent harness that drives Fable 5 in a long-horizon loop, the classifier is a routing decision that may happen mid-session, mid-tool-call, mid-iteration. For most workloads, the classifier will not fire, and your agent will get a Mythos-class model. For the workloads that do fire — explicit cybersecurity tasks, certain bioweapon-relevant queries, certain autonomous-targeting patterns — your agent will silently get downgraded to Opus 4.8 without an explicit signal back to your harness. The behavior delta between Mythos and Opus 4.8 is not subtle. If your agent has been evaluated against Fable 5 and shipped, and the classifier fires on a real production user, you have a model-substitution problem you did not plan for.
Three concrete things to do about this:
1. Add a model-identification signal to your agent logs. Most API surfaces do not surface which model actually answered, only which one the harness requested. You need this. The classifier can swap the model mid-session, and your evals need to know.
2. Build a per-topic eval suite for the high-risk categories. The classifier fires on a known list. Build evals for the categories you care about — the things you want the agent to do, the things you do not want the agent to do, the things the classifier is supposed to catch. Re-run the suite against Fable 5, Opus 4.8, and any future model you evaluate. The substitution behavior is the production risk, not the model.
3. Treat the classifier as a separate infrastructure component, not a model feature. It is going to have its own update cadence, its own false-positive rate, its own failure modes. It is going to be a third dependency in your agent stack alongside the model and the harness. Plan for it as one.
I want to walk through the sequence, because I think the order is the story. The April refusal and the June release are not contradictions. They are two scheduled stops on the same arc, and the receipts are in the public model lineup, the public compute commitments, and the public S-1.
April 7. Anthropic publishes the 244-page Claude Mythos Preview system card. SWE-bench Verified 93.9% (Mythos Preview's prior-generation metric). Cybench 100%. USAMO 97.6%. Project Glasswing is announced. The model goes to defensive cybersecurity work, $100M in usage credits, twelve partners. (Anthropic Mythos Preview)
Mid-April. Mythos, under Glasswing, finds 271 zero-day vulnerabilities in Firefox in a single engagement. Mozilla ships Firefox 150, the largest single security batch in the browser's history. The defensive use case is real and the throughput is real.
Late April. Anthropic releases Opus 4.7 with what The Register called an "overzealous query cop" — an Acceptable Use Policy classifier that breaks legitimate developer workflows. Users notice. The fix is partial. (Anthropic Opus 4.7)
May 3-6. Anthropic signs a deal with xAI/SpaceX to lease the compute capacity of Colossus 1 in Memphis — over 220,000 NVIDIA GPUs, 300 megawatts. Announced publicly on May 6, the same day Anthropic publicly confirms for the first time that the throttling Claude Pro and Max users had been complaining about for months was driven by a compute shortage. Anthropic had been juggling researcher training time against customer demand and losing. Some Max subscribers reported their five-hour usage windows running out in nineteen minutes.
May 20. Anthropic tells investors Q2 2026 will be its first quarter of positive operating income — roughly $559M on $10.9B in revenue. The same day, SpaceX's S-1 filing makes the compute lease terms public: $1.25B per month, $15B a year, three-year term. (SpaceX S-1)
May 28. Anthropic releases Opus 4.8. A modest but real release — same pricing, same 1M context, the same hardware story. (Anthropic Opus 4.8)
June 1. Anthropic confidentially files a draft Form S-1 with the SEC. Morgan Stanley as lead underwriter. Goldman Sachs and JPMorgan as co-leads. Post-money valuation around $965B. IPO target: as early as October.
June 9. Anthropic releases Claude Fable 5 on the public API. Mythos 5 launches alongside for partner-only access. The system card runs 319 pages. The model Anthropic spent April telling us was too dangerous to release is, with a classifier in front of it, available to anyone with a Pro subscription. (Anthropic Fable 5)
Read those dates in order and a specific shape emerges. Each individual move had a reasonable story attached to it. Stacked end to end, the moves stop looking like a research lab adapting to circumstances. They start looking like a company executing a plan. The April refusal was the safety story for the launch. The Glasswing program was the proof that the model could be safely deployed. The compute deal was the capacity for the public release. The Opus 4.7 guardrail experiment was the dry run for the Fable 5 classifier. The positive Q2 2026 was the financial story. The S-1 was the public-equity commitment. The June 9 release is the API.
I do not know how to read this as anything other than a planned transition. The classifier in front of Fable 5 is the load-bearing piece of the safety narrative for the S-1. The pricing — $10/M input, $50/M output, less than half the price of Mythos Preview — is the developer-adoption story for the S-1. The model lineup shift — Haiku updates stalled, Sonnet updates slowed, Opus 4.7 and 4.8 as the safe-path models, Fable 5 as the new flagship, Mythos 5 as the partner tier — is the product strategy for the S-1.
None of this is bad. The classifier is real engineering. The defensive use case is real. The model is real. The pricing is aggressive in a way that benefits developers. I am not saying Anthropic is doing something wrong. I am saying that the timing of "too dangerous to ship" in April and "shipping with a 5% classifier" in June, on the same model, with an S-1 filed between the two dates, is not a coincidence. It is a sequence. And the agent stack is downstream of that sequence now.
The day-one changes for production agent teams are concrete.
Long-horizon autonomous work is now in the public API. A model that can run for days in an agent harness, planning across stages, delegating to sub-agents, and checking its own work — with the Stripe 50M-line Ruby migration as the empirical anchor — changes the unit economics of a class of agent products that previously required either a human in the loop, a fleet of weaker models, or a multi-day ops run. The lead over Opus 4.8 on long-horizon workloads is the largest of any generation. (Anthropic Fable 5)
The pricing is the production story. $10/M input, $50/M output. That is less than half the Mythos Preview price. It is 2x Opus 4.7 and 4.8 at $5/M input, $25/M output. For a workload that needs Fable 5's capability, the price is in the range where a serious agent product can plan a unit-economics model around it. For workloads that do not need Fable 5's capability, Opus 4.8 is now the default — and the cost-of-revenue story for the closed labs just got more interesting.
The Mythos 5 partner tier is the new defensive-security floor. Project Glasswing partners — Apple, Microsoft, Google, AWS, CrowdStrike, the Linux Foundation, and the rest of the original twelve — get a model with the classifier lifted in some areas, which means a model that can be used offensively inside the defensive mission. For an open-weights team trying to compete on cybersecurity benchmarks, Mythos 5 is the new bar. For a closed-weights lab, it is the partner-only product you cannot buy.
The model lineup is now four tiers, not three. Haiku (utility), Sonnet (workhorse), Opus 4.7/4.8 (safe-frontier), Fable 5 (Mythos-class, classifier-routed), Mythos 5 (partner-only, classifier lifted in some areas). If you are building a router or a model-arbitrage layer, the new lineup is the new routing problem. The marginal cost of routing — both in latency and in dollars — just changed.
The open-weights gap just widened. MiniMax M3 is the current open-weights leader on coding, and M3's SWE-Bench Pro number is in the high 50s. Fable 5 is at 80.3%. The capability gap between the open-weights frontier and the closed frontier on agent-coding just grew by 20+ points. M3 is still the right answer for cost-sensitive workloads and for in-vpc deployments. M3 is not the right answer for the workloads Fable 5 was built for. The "open weights can match closed" narrative is now a more specific argument than it was a week ago. (MiniMax M3)
I have been writing about this for two days, and there is one read I cannot dismiss.
The classifier in front of Fable 5 is a meaningful safety architecture. The engineering behind it is real. The 319-page system card is the kind of disclosure I have been asking closed labs to publish for two years. The defensive use case in Glasswing is real and the throughput is real. The model is shipped with the classifier on the public API, with a conservative false-positive rate, with the explicit commitment to improve the classifier as more capable models arrive. I do not know how to argue that this is not a genuine safety approach.
The other read I cannot dismiss is that the classifier is the safety narrative Anthropic needed for the S-1. The public release of a model that was "too dangerous" eight weeks ago, in the same quarter the company becomes public-equity-reportable, with a routing layer that gives the company plausible deniability on the workloads they do not want to host — that is a real sequence of events with a real financial motive. Both reads are true at the same time. I have been around this industry long enough to know that the right answer is usually that both reads are correct, the engineering is real, and the financial motive is also real, and the question is which one you weight when you make your build-vs-buy decision.
I will tell you what I am doing about it. I am treating Fable 5 as a serious new tool for long-horizon agent work, and I am treating the classifier as a separate infrastructure component that needs its own evals, its own monitoring, and its own update cadence. I am not treating it as a safety proof. I am treating it as a safety architecture I can route around, instrument, and reason about. The S-1 narrative and the safety narrative are both true, and the production stack has to work under both readings.
If you build production agents: the day-one task is the model-identification signal in your logs. If you cannot tell which model answered in a Fable 5 session, you cannot tell whether the classifier fired, and you cannot reason about behavior variance. The classifier is a new infrastructure dependency. Treat it like one.
If you build routers or model-arbitrage layers: the new lineup is the new problem. The cost-of-revenue math between Opus 4.8, Fable 5, and Mythos 5 (for partners) is the new routing axis. Build the eval suite that distinguishes them, then build the router.
If you are on the open-weights tier: M3 is still the right answer for the workloads it was the right answer for last week. The capability gap on agent coding is wider. The cost gap is also wider. The "open vs closed" argument is now a more specific argument. Make it well.
If you are evaluating the closed labs: the classifier is the surface to test. Do not test Fable 5 in isolation. Test the Fable-5-with-classifier surface, the Opus-4.8-fallback surface, and the behavior delta between them on your real workload. The substitution is the production risk.
If you are filing or thinking about an S-1: the Fable 5 release is the playbook. A model too dangerous to ship, shipped with a routing layer, priced aggressively, with a partner tier for the workloads the routing layer excludes, in the same quarter the company files to go public. This is how a frontier lab ships the most capable model it has ever built and keeps the safety story intact for the prospectus. Take notes.
Anthropic released the model they refused to ship in April, with a classifier in front of it, at less than half the prior price, on June 9, 2026, eight days after a confidential S-1 filing. The benchmarks are the easy story. The classifier is the new public-API surface. The pricing is the developer-adoption story. The IPO is the financial story. The Mythos 5 partner tier is the new defensive-security floor. The model lineup is now four tiers, and the routing problem just got harder.
The press is going to write about the 80.3% SWE-Bench Pro number. That number is real. The 50M-line Ruby migration in a day is real. The drug-design cycle accelerated by ten times is real. The genomics model 100 times smaller than the one in Science is real. The classifier is real. The S-1 is real. The "too dangerous to ship" in April and "shipping with a 5% classifier" in June are two stops on the same arc, and the agent stack is downstream of the arc now.
The thing I keep landing on, after two days of reading the system card and the announcement and the partner-tier details, is this: Anthropic did not change its mind about Mythos between April and June. Anthropic found the smallest engineering change that let them ship it. The change is a classifier, the change is a regulatory story for an S-1, the change is a 5% trigger rate, and the change is the most important new infrastructure component in the agent stack in 2026. The model is the same. The surface is new. The production math is downstream of the surface.
Build for the surface. Instrument the classifier. Plan for the substitution. And assume that the next "too dangerous to ship" model from any frontier lab is going to ship eight weeks later with a routing layer in front of it, priced for adoption, in the quarter the lab files to go public. That is the playbook now. Fable 5 is the first time it has been run.
— Mr. Technology
Release date: June 11, 2026. Source event: Anthropic Fable 5 / Mythos 5 launch, June 9, 2026. Topic: Claude Fable 5 (public, classifier-routed, 5% trigger), Claude Mythos 5 (partner-only, classifier lifted in some areas), Project Glasswing. Underlying model: same weights as Claude Mythos Preview (April 7, 2026). Benchmarks: SWE-Bench Pro 80.3% (vs Opus 4.8 69.2%, GPT-5.5 58.6%), Hex core analytics 90%+ (first to clear 90), Hebbia Finance Benchmark highest in field, Cognition FrontierCode highest at medium effort, vision SOTA including Pokémon FireRed from raw screenshots, 50M-line Ruby migration in a day (Stripe), Slay the Spire memory benefit 3x Opus 4.8, drug design 10x acceleration (9 of 14 protein targets yielding candidates), genomics model 100x smaller than recent Science paper but outperforming it. Pricing: $10/M input, $50/M output (less than half Mythos Preview). System card: 319 pages (up from 244 for Mythos Preview). Distribution: Fable 5 to all Claude subscribers including Pro; Mythos 5 to Project Glasswing partners initially, broader trusted-access program to follow. Compute context: 220,000+ NVIDIA GPUs at Colossus 1 (Memphis) under $1.25B/month, $15B/year SpaceX deal. Financial context: Q2 2026 first quarter of positive operating income (~$559M on $10.9B revenue per May 20 investor briefing), draft S-1 filed June 1, 2026, lead underwriter Morgan Stanley, post-money valuation ~$965B, IPO target as early as October 2026. Sources: Anthropic — Claude Fable 5 and Claude Mythos 5, Anthropic — Claude Mythos Preview (April 7, 2026), Anthropic — Claude Opus 4.7, Anthropic — Claude Opus 4.8, Anthropic — Project Glasswing initial update, Global Tech Research — The Frontier Just Went Public, TrueFoundry — Claude Fable 5 on AI Gateway, MiniMax M3 (open-weights comparison).