
Hey guys, Mr. Technology here — let me break this one down.
What You Need to Know: GPT-5.5 posted better scores than Anthropic's Fable 5 on the Agents Last Exam benchmark — the new tool-use eval that the labs are using to gate frontier model releases. Same week, an independent team trained a 7B foundation model from scratch for $1,500 using rented H100s, and it cleared 80% of MMLU. The frontier isn't just shifting — it's fracturing.
Anthropic's Fable 5 launch post on June 9 set a new bar on long-horizon coding and multi-agent workflows, but the Agents Last Exam — the new tool-use benchmark that Anthropic, OpenAI, and Google all contributed to — was the real test. By Friday June 13, GPT-5.5 had taken the lead on the public leaderboard, edging out Fable 5 by 4 points on the tool-use track and 6 points on the multi-step planning track.
The more interesting side story: a small research team published a writeup of training a 7B-parameter model from scratch for $1,500. They used spot-priced H100s, the Llama 2 tokenizer, the SlimPajama dataset, and a custom training loop that hit 80.2% on MMLU. Two years ago, this model class cost $2-5M to train. The efficiency curve is now steep enough that a single GPU cluster weekend can produce a foundation model that competes with last year's flagship.
VentureBeat's coverage (source) captures both stories as part of the same weekly wrap. The framing — "Fable 5 set records Monday; by Thursday, GPT-5.5 had beaten it" — is fair, but the more important shift is the second story. The frontier is now a price band, not a single point.
The "Fable 5 vs GPT-5.5" race will dominate coverage for the next quarter, but it's the wrong story. The right story is that a $1,500 from-scratch model just matched 80% of MMLU. The pricing power of "frontier" is collapsing from above. Within 18 months, the question won't be "which frontier model do I use" — it'll be "do I need a frontier model at all, or is a $1,500 from-scratch fine-tune enough for my use case." For builders, the implication is clear: the agents you ship in 2027 will increasingly be a mix of frontier (for the hard stuff) and from-scratch (for the routine stuff), and the second category is going to get a lot cheaper.
GPT-5.5 edged out Fable 5 on the Agents Last Exam tool-use benchmark. Same week, a 7B model trained from scratch for $1,500 hit 80% MMLU. The frontier is no longer a single point — it's a price band, and the lower bound is moving fast.
Sources: