Anthropic disclosed on June 4, 2026 that Claude now authors 80% of its own merged code, with engineers shipping 8× more per quarter than in 2021–2025. The same week, SWE-Bench Pro and Terminal-Bench 2.0 became the field's new standard agent-coding benchmarks.

Claude now writes 80% of Anthropic's code

Anthropic just told the world, on the record, that Claude now writes more than 80% of the code that gets merged into its own systems. Up from "low single digits" in 2021–2024. This is the most concrete admission yet that the AI-coding-replaces-engineers narrative is now self-confirming at the model labs.

What You Need to Know: In a June 4, 2026 blog post titled "When AI Builds Itself," Anthropic's Institute team disclosed that Claude now authors more than 80% of the merged code across Anthropic's own codebase, and that the average engineer ships roughly 8× as much code per quarter as they did in 2021–2025. The same week, two new agent-coding benchmarks — SWE-Bench Pro and Terminal-Bench 2.0 — cemented the "agentic coding" category as the next evaluation battleground.

Why It Matters

Self-reported 80% is the strongest admission yet. Most AI-coding claims are vendor-led and unaudited. Anthropic's number is from their own codebase, on the record, with a methodology (merge count, not lines written). Treat it as a leading indicator of what the rest of the industry will look like in 12–18 months.
The 8× productivity claim is harder to verify. "Code shipped per quarter per engineer" is a noisy metric — it conflates AI productivity with simpler codebases, better tooling, and changed scope. The number is directionally correct but not a clean apples-to-apples comparison.
SWE-Bench Pro and Terminal-Bench 2.0 are the new yardsticks. SWE-Bench Verified was saturated in 2025; the new Pro version requires full repo context, longer-horizon tasks, and harder bugs. Terminal-Bench 2.0 measures terminal/CLI workflows, where agents live or die. GPT-5.5 reports 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro as of June 2026.
The competitive gap is closing fast. Codingfleet's June 4, 2026 SWE-Bench Pro explainer notes that Terminal-Bench (CLI/DevOps), LiveCodeBench (algorithms), and OSWorld (desktop agents) have all become required reading. If you ship coding agents, you need to track all three.

What Actually Happened

The 80% number, from Anthropic itself

Anthropic's Institute page on recursive self-improvement, published June 4, 2026, gives the methodology in one line: "today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025." The Scientific American summary, written by Chris Stokel-Walker, makes the related claim explicit: "Anthropic said Claude now writes more than 80 percent of the code merged into its systems, up from low single digits before the [latest model generation]." That stat is now part of the public record. Multiple outlets have confirmed it, including the Metaintro coverage and the Tom's Hardware analysis.

The post also walks through the four eras of internal AI usage: 2021–2023 (laptops), 2023–2025 (chatbots suggesting code), 2025–2026 (coding agents writing entire files), and "today" (autonomous agents delegating hours of work to other agents). That trajectory is the substance behind the 80% number.

The two new agent-coding benchmarks

Codingfleet's June 4, 2026 SWE-Bench Pro explainer lays out the new evaluation stack. SWE-Bench Verified — the 2024–2025 standard — was saturated by every frontier model by late 2025. SWE-Bench Pro (released by Scale AI in late 2025, updated in 2026) requires agents to handle long-horizon, multi-file tasks in full repositories, with private test sets. Terminal-Bench 2.0 — from the Terminal-Bench paper on arXiv (January 2026) — measures CLI and DevOps workflows, where "agents live or die" in production.

The current leaderboard picture, per Firecrawl's June 2026 ranking: GPT-5.5 at 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro. Claude Opus 4.6 sits at 80.8% on SWE-Bench Verified (the older metric). The takeaway: the new benchmarks are doing their job — differentiating models that the old ones had flattened.

The Take

The 80% number is real, but it's not the whole story. What it actually says is that Anthropic's engineering team has reorganized around AI-first workflows: humans set the spec, Claude does the implementation, humans review. The 8× productivity multiplier is downstream of that reorganization — you don't get 8× by typing faster, you get it by changing what your job is. For the rest of the industry, the lesson isn't "fire your engineers" — it's "stop writing code by hand for problems that an agent can spec, implement, and test."

Quick Summary

Anthropic says Claude now writes 80% of its own merged code, with engineers shipping 8× more per quarter than they did in 2021–2025. Two new agent-coding benchmarks — SWE-Bench Pro and Terminal-Bench 2.0 — are now the field's standard yardsticks, with GPT-5.5 and Claude Opus 4.6 trading top scores.

Sources

Source: VentureBeat | mr.technology — The Master Skill Index

Claude now writes 80 of Anthropics code