← Back to Payloads
ai2026-06-13

GPT-55 just beat Claudes best model

Hey guys, Mr. Technology here — let me break this one down. GPT-5.5 posted better scores than Anthropic's Fable 5 on the Agents Last Exam benchmark — the new tool-use eval that the labs are using to gate frontier model releases. Same week, an independent team trained a 7B foundation model from scratch for $1,500 using rented H100s, and it cleared 80% of MMLU. The frontier isn't just shifting — it's fracturing.
Quick Access
Install command
$ mrt install ai
Browse related skills
GPT-55 just beat Claudes best model

GPT-5.5 just beat Claude's best model — and a $1,500 from-scratch model shipped the same week

Hey guys, Mr. Technology here — let me break this one down.

What You Need to Know: GPT-5.5 posted better scores than Anthropic's Fable 5 on the Agents Last Exam benchmark — the new tool-use eval that the labs are using to gate frontier model releases. Same week, an independent team trained a 7B foundation model from scratch for $1,500 using rented H100s, and it cleared 80% of MMLU. The frontier isn't just shifting — it's fracturing.

Why It Matters

  • GPT-5.5 vs Fable 5 isn't a single benchmark story — it's the moment "tool use" became the canonical metric. The Agents Last Exam (ALE) is now what MMLU was in 2023: the test every new model has to clear. Whoever leads ALE controls the "frontier" narrative for the next 12 months.
  • A $1,500 from-scratch model is the more interesting story. A team trained a 7B-parameter foundation model for under $2K using spot-priced H100s. That's two orders of magnitude cheaper than the 2023 cost basis for the same model class. It implies the marginal cost of intelligence is collapsing — which has implications for everyone building on top of these models.
  • "Frontier" is no longer a single thing. GPT-5.5 wins on tool use. Fable 5 wins on long-context coding. The $1,500 from-scratch model wins on the dollar-per-IQ axis. The market is fragmenting by use case, not consolidating around one model.

What Actually Happened

Anthropic's Fable 5 launch post on June 9 set a new bar on long-horizon coding and multi-agent workflows, but the Agents Last Exam — the new tool-use benchmark that Anthropic, OpenAI, and Google all contributed to — was the real test. By Friday June 13, GPT-5.5 had taken the lead on the public leaderboard, edging out Fable 5 by 4 points on the tool-use track and 6 points on the multi-step planning track.

The more interesting side story: a small research team published a writeup of training a 7B-parameter model from scratch for $1,500. They used spot-priced H100s, the Llama 2 tokenizer, the SlimPajama dataset, and a custom training loop that hit 80.2% on MMLU. Two years ago, this model class cost $2-5M to train. The efficiency curve is now steep enough that a single GPU cluster weekend can produce a foundation model that competes with last year's flagship.

VentureBeat's coverage (source) captures both stories as part of the same weekly wrap. The framing — "Fable 5 set records Monday; by Thursday, GPT-5.5 had beaten it" — is fair, but the more important shift is the second story. The frontier is now a price band, not a single point.

The Take

The "Fable 5 vs GPT-5.5" race will dominate coverage for the next quarter, but it's the wrong story. The right story is that a $1,500 from-scratch model just matched 80% of MMLU. The pricing power of "frontier" is collapsing from above. Within 18 months, the question won't be "which frontier model do I use" — it'll be "do I need a frontier model at all, or is a $1,500 from-scratch fine-tune enough for my use case." For builders, the implication is clear: the agents you ship in 2027 will increasingly be a mix of frontier (for the hard stuff) and from-scratch (for the routine stuff), and the second category is going to get a lot cheaper.

Quick Summary

GPT-5.5 edged out Fable 5 on the Agents Last Exam tool-use benchmark. Same week, a 7B model trained from scratch for $1,500 hit 80% MMLU. The frontier is no longer a single point — it's a price band, and the lower bound is moving fast.


Sources:

Related Dispatches