MiniMax M3 at $0.30/M input with 59% SWE-Bench Pro is the first credible open-weight challenge to closed-source frontier coding models. Anthropic's 31.5% browser-agent hijack rate is real but unmeasurable against anyone else's number.

MiniMax-M3 is 30 cents per million tokens, Opus is $5

MiniMax shipped M3 on June 1, 2026, and the pricing math is brutal. MiniMax-M3 starts at $0.30 per million input tokens — a fraction of Claude Opus 4.7's $5 per million. M3 also scores 59.0% on SWE-Bench Pro, beating GPT-5.5 (58.6%) and approaching Claude Opus 4.7 (64.3%), all on a 1-million-token context window with native multimodal input. The follow-up: Anthropic's own browser agent was hijacked 31.5% of the time in a VentureBeat survey of prompt-injection disclosures.

What You Need to Know: MiniMax's M3 lands at $0.30 per million input tokens with a 59.0% SWE-Bench Pro score, undercutting Claude Opus 4.7 by roughly 17x on price. The same week, a VentureBeat analysis found Anthropic's browser agent was hijacked 31.5% of the time before safeguards engaged — and four labs have four different ways of measuring the same problem.

Why It Matters

$0.30/M vs $5/M is a 17x price gap on comparable coding tasks. If M3's SWE-Bench Pro numbers hold under independent testing, this is the most disruptive pricing event in frontier models since DeepSeek V3.
1M context + native multimodal changes the agent build equation. Most coding agents in 2026 are wired together from a 200K-context text model plus a separate vision model. M3 collapses both into one call.
The Anthropic browser agent disclosure exposes a measurement crisis. Four frontier labs (Anthropic, OpenAI, Google, Meta) all published prompt-injection evaluations in 2026 — and none of them measure the same thing. A 31.5% hijack rate sounds catastrophic. Without context, it's almost meaningless.
Open-weight release within 10 days. MiniMax said open weights and the technical report will land on Hugging Face roughly 10 days after the June 1 launch — putting the open release around June 11. Self-hosted frontier coding performance is about to become a real option for enterprises with GPU budgets.

What Actually Happened

MiniMax M3 launches at $0.30/M, 1M context, multimodal, open-weights pending

MiniMax shipped M3 on June 1, 2026, and the marketing position is unambiguous: "the first open-weight model to combine frontier coding, a 1M-token context window, and native image and video understanding" in a single model.

The architecture is MiniMax Sparse Attention (MSA) — a sparse-attention scheme the company developed for M2, killed during the M2 generation, and brought back for M3. The reported performance: per-token compute at 1M context drops to one-twentieth of M2.7, with 9.7x faster prefill and 15.6x faster decoding at full context length. A guaranteed minimum of 512K tokens applies even under heavy workloads.

OpenRouter's listing confirms the price: $0.30 per million input tokens, $1.20 per million output. The blended cost with cache optimization drops to roughly $0.06 per million, per Fello AI's breakdown.

On SWE-Bench Pro, Codersera's analysis reports M3 at 59.0% versus GPT-5.5 at 58.6%, Claude Opus 4.7 at 64.3%, and Gemini 3.1 Pro at 54.2%. On BrowseComp (agentic browsing), M3 hits 83.5% — better than Opus 4.7's 79.3%. M3 was also reported to autonomously reproduce an ICLR 2025 paper across 18 commits over 12 hours.

The catch: most of these benchmarks come from MiniMax's own infrastructure, with MiniMax's own agent scaffolding. Independent third-party validation is still pending, and the open-weights drop is promised "within roughly 10 days" of launch.

Anthropic's browser agent was hijacked 31.5% of the time

On June 1, 2026, VentureBeat published Louis Columbus's analysis of prompt-injection disclosures from four frontier labs. The headline: "Anthropic's browser agent got hijacked 31.5% of the time before safeguards engaged."

The piece is more interesting for what it says about disclosure than about any single number. Columbus notes that "Anthropic, OpenAI, Google, and Meta published prompt injection disclosures in 2026 — but no two measure the same thing." Different labs use different attack taxonomies, different success criteria, and different definitions of "safeguard engaged." A 31.5% figure from one lab is not comparable to a 12% figure from another. The disclosure is real. The comparability is not.

For practitioners, the practical takeaway is: prompt injection is unsolved, and the four biggest labs in the world cannot agree on how to measure the threat. If you're building production agents that touch untrusted web content, you should assume the model will be hijacked at some point and design the action surface to be safe even when the prompt is.

The Take

MiniMax M3 is the first open-weight model that's plausibly good enough for serious agentic coding at a price that's 17x cheaper than the closed-source alternative. If the SWE-Bench Pro number survives independent replication — and the open weights give the community a way to test it — this is the moment open-source models stop being "good enough for prototypes" and become the default.

The Anthropic disclosure is the more embarrassing story. The labs have been publishing vulnerability disclosures for years. The fact that they still can't agree on a measurement framework tells you how immature the threat model is. Until there is a shared, public benchmark for prompt-injection resistance, every "our model is X% safe" claim is a marketing line. The work is in standardizing the test, not running it.

MiniMax-M3 is 30 cents per million tokens Opus is 5