Google shipped Gemini 3.5 Flash to GA with a 4x speed claim and half-price tag. Andrej Karpathy joined Anthropic for a return to R&D. OpenAI launched Guaranteed Capacity — a new tier of 1-3 year compute commitments.

Gemini 35 Flash , Karpathy joins Anthropic , OpenAI Guarante

Three announcements landed in the same 24 hours: Google shipped Gemini 3.5 Flash to general availability with a 4x speed claim and a half-price tag, Andrej Karpathy left OpenAI/Education and joined Anthropic for a return to R&D, and OpenAI launched Guaranteed Capacity — a new tier where customers can lock in 1-3 year compute commitments. The news flow reads like a snapshot of the entire AI-industry operating model on a single day.

What You Need to Know: Google's Gemini 3.5 Flash is generally available, with a developer guide describing Thinking, structured outputs with tools, multimodal function responses, code execution with images, and combined tool use (but no Computer Use yet). Andrej Karpathy announced he has joined Anthropic, citing the next few years at the LLM frontier as "especially formative" for a return to R&D. OpenAI launched a Guaranteed Capacity offering for 1-, 2-, and 3-year compute commitments, available until the current allocation sells out.

Why It Matters

The "Flash tier" just got competitive with the flagship. Gemini 3.5 Flash claims to beat 3.1 Pro on most benchmarks, runs 4x faster than other frontier models, and is priced at less than half the cost. If true at the API level, this is the model that disrupts the per-token pricing structure of the entire industry — including OpenAI's and Anthropic's.
Karpathy's move is a signal, not a surprise. A co-founder of OpenAI returning to R&D at Anthropic isn't a routine hire; it's a statement about where the frontier research is happening. The "return to R&D" framing is the polite version of "the next 24 months of model research are at Anthropic."
Guaranteed Capacity is the new enterprise sales motion. OpenAI selling 1-3 year compute contracts is the first time a frontier model vendor has explicitly turned the cloud-style capacity reservation model into an AI product. For enterprise procurement teams, this is the SKU that makes an OpenAI build-vs-buy decision tractable.

What Actually Happened

Gemini 3.5 Flash, and what "frontier at half the cost" actually means

Google's blog post for Gemini 3.5 Flash (May 19) and the developer guide on Google AI Studio describe the model as the first in a new series "combining frontier intelligence with action." Features include Thinking, structured outputs with tools, multimodal function responses, code execution with images, and combined tool use — but explicitly not Computer Use yet, which is the gap Anthropic and OpenAI have been working to close. Simon Willison's read of the announcement is that the model is "more expensive, but Google plan to use it for everything" — the trade-off is that 3.5 Flash is 22x more expensive than the old Flash per output token ($9/M output), but it's the model Google is using internally for Antigravity 2.0 and across its own products.

The pricing math is the actual story. Pichai's I/O keynote made the case explicitly: "If companies used a mix of Flash and other frontier models they could save a lot of money. To put this in perspective, top companies are processing about 1 trillion tokens a day. If they shifted 80% of their workloads from other frontier models to 3.5 Flash, they'd save over $1 billion dollars annually." For builders, this is a real workload-shaping opportunity: the question of "which model do I call for this query" just got a concrete answer. For OpenAI and Anthropic, it's the moment they have to decide whether to match on price, lean into reasoning depth, or push the closed-loop "agent platform" story.

Sundar Pichai's I/O blog post also confirmed the deployment surface: Gemini 3.5 Flash ships across Search, enterprise tools, Android Studio, and Google's developer platforms on day one. Internal Google usage is up to 3 trillion tokens per day, doubling every few weeks from the half-trillion baseline of March. The flywheel is real, and the build-vs-buy calculus for any startup building an agent product is now: "could I just use the Gemini API for the inference layer and skip building my own model serving?"

Karpathy to Anthropic, and what a co-founder move signals

Andrej Karpathy's announcement (X post, May 19) is short: he's joined Anthropic, the next few years at the LLM frontier are "especially formative" for him, and he plans to return to education work "later" — signaling this is a research-focused move, not a permanent pivot away from teaching. Reuters' coverage of the move frames it as OpenAI co-founder and former Tesla AI executive joining Anthropic to lead R&D; the timing (right after I/O, where Google made the agentic case for Gemini) is doing a lot of work in the news cycle.

The signal content here is the research-direction question. Anthropic has been on a sustained pretraining + mechanistic interpretability + agentic safety push for 18 months, and Karpathy's background (Tesla Autopilot, OpenAI founding, Eureka Labs education) maps onto exactly the kind of "production-grade autonomous systems" work Anthropic is staffing up for. The competitive read: Anthropic is positioning itself as the research destination for senior talent who want to do frontier model work without the OpenAI product-shipper culture, and the recruiting pitch is "come do the actual science." Whether that holds depends on whether Anthropic's interpretability work produces a public research artifact that justifies the cost of the move.

OpenAI Guaranteed Capacity, and the cloud-style sales motion for AI compute

CNBC's coverage of OpenAI's Guaranteed Capacity offering (May 19) is the third announcement of the day and probably the most consequential for enterprise builders. The new product lets customers "secure long-term access to compute to power AI products, agents, and workflows" via 1-, 2-, and 3-year commitments, with discounts based on length. The current allocation is finite — OpenAI will sell until it runs out, and offer again in the future. This is the cloud-style capacity reservation model ported to the LLM era: a hedge against price spikes and availability throttling, in exchange for a multi-year commitment.

The enterprise sales motion is now codified. For a CFO, Guaranteed Capacity turns the "OpenAI bill is variable and scary" objection into a predictable line item. For a builder, it's the SKU that makes the "we should use OpenAI for this production workload" conversation winnable in a procurement review. For OpenAI, it's the revenue-recognition structure that supports the September IPO story. The TLDR AI newsletter's coverage of "Anthropic's revenue set to more than double in Q2 to $10.9B" (WSJ) and "Cheap AI could derail OpenAI and Anthropic's IPOs" (CNBC) is the immediate competitive context: both companies are now selling capacity in a market where Chinese open-weight models and Google's Flash-tier pricing are pushing the per-token cost down 10x year-over-year.

The Take

Three announcements, one operating model: the AI industry is now selling the same three things — model access, agent infrastructure, and reserved compute — at the same time. The differentiator for the next 18 months is not the model. It's the unit economics of the inference layer and the procurement story around long-term commitments. The builders who win are the ones who figure out which model to call for which query (because the per-token price gap is now 10x), and which platform to commit to (because the agent runtime is the new lock-in surface).

Karpathy's move is a research-direction signal, not a talent-acquisition story. The competitive read is that Anthropic is the only lab where "frontier R&D" is the explicit pitch right now — OpenAI is product-IPO mode, Google is vertically-integrated-agent-platform mode, xAI is compute-and-ego mode, and Meta is in restructuring mode. If you are a senior researcher deciding where the next 3 years of model work is going to be most consequential, Anthropic is the obvious answer. Whether that produces a public research artifact that justifies the recruiting cost is the open question.

On Guaranteed Capacity: if you are a builder with a production agent workload, the right move this quarter is to get pricing from OpenAI, Anthropic, and Google for a 12-18 month compute commitment. The "free market" per-token pricing is going to look very different 6 months from now, and the discount on a 3-year commitment is going to look obvious in hindsight. The risk is being locked into a model that gets out-priced by a frontier-tier Flash equivalent. The hedge is to commit to a fraction of your projected usage, not all of it.

Quick Summary

Gemini 3.5 Flash is GA, with a 4x speed and half-price claim that, if true, disrupts the per-token pricing of the entire industry; Andrej Karpathy joined Anthropic to lead R&D, signaling the lab's positioning as the research destination for senior AI talent; and OpenAI's new Guaranteed Capacity offering turns the cloud-style capacity reservation model into an AI product. The industry is now selling model access, agent infrastructure, and reserved compute at the same time. The builders who win are the ones who figure out which model to call for which query, and which platform to commit to.

Sources: