MiniMax M3 dropped June 1, 2026 as the first open-weight model combining frontier coding, 1M context, and native multimodality on MIT license at $0.30/M input. MSA sparse-attention delivers 20x per-token compute reduction at 1M context, and the agent demos are real.

MiniMax Shipped the First Open-Weights Frontier Multimodal Model, and the MSA Architecture Is the Real Story

MiniMax released MiniMax M3 on June 1, 2026, and the AI press is mostly framing it as "another open-weights release." That framing is wrong. M3 is the first open-weight model to land frontier coding, 1M-token context, and native multimodality in a single checkpoint, on an MIT license, at $0.30 per million input tokens. The closed frontier sells that combination as three separate products. MiniMax bundled all three and cut the price by an order of magnitude. The architectural move that makes it possible — MSA, MiniMax Sparse Attention — is the most consequential open-weights architecture release of the year, and the production math for serious agent work just changed.

The Architecture Is The Story, Not The Benchmark

A 59% SWE-Bench Pro is a number. MSA is a platform. The closed frontier has been operating on dense attention, paying quadratic compute on every context-lengthening cycle, and hiding the cost inside enterprise contracts. Open-weights labs have been trying to scale context with the same primitive and getting crushed on inference economics. MiniMax rewrote the primitive.

MSA partitions the KV cache into blocks more precisely than DSA or MoBA, the two leading open-source sparse-attention approaches, and adopts a "KV outer gather Q" pattern that reads each block exactly once with contiguous memory access. MiniMax's benchmarks put MSA at more than 4x faster than Flash-Sparse-Attention and flash-moba. At a 1M context length, per-token compute drops to 1/20 of the previous-generation MiniMax model, with 9x prefilling speedup and 15x decoding speedup. Across ablations, MSA matched full attention on the vast majority of capabilities.

This is the move that turns "1M context is a marketing line" into "1M context is a deployable surface." When context costs scale linearly instead of quadratically, the 1M window stops being a benchmark and starts being an architectural primitive you can plan product around. Long-running agents that have to read a real codebase, a real corpus, a real session log finally have a model whose cost matches the workload. Nemotron 3 Ultra last week had the inference-speed story; M3 has the long-context-cost story.

The Benchmark, Honestly

The numbers MiniMax published: SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, SWE-fficiency 34.8%, KernelBench Hard 28.8%, MCP Atlas 74.2%. On coding, M3 surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Opus 4.7. On SVG-Bench, M3 surpasses Opus 4.7. On OmniDocBench, the multimodal document understanding benchmark, M3 scores above Gemini 3.1 Pro. On Claw-Eval, the end-to-end agent harness, M3 posts the highest score in the field.

Honest read: M3 is not the best model in the world — Opus 4.7 is still ahead on SWE-Bench Pro by a couple of points, and Opus 4.8 is still ahead on the AA intelligence index. But it is the best open-weight model on coding, on long-context agent harnesses, on multimodal document understanding, and on computer use — all in one checkpoint. The previous best open-weights releases had one or two of these. M3 has all of them, plus multimodal, at $0.30 per million input tokens. That is the open-weights bar for 2026, set in the first week of June.

The Real-World Demonstrations Are the Differentiator

The benchmark table is the headline. The agent traces are the story.

The ICLR 2025 paper reproduction. MiniMax gave M3 the "Learning Dynamics of LLM Finetuning" paper and asked it to reproduce the work end to end. The model ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and successfully replicated the SFT prediction-probability trend, the DPO squeezing effect, and the Extend mitigation method. Only M3 was able to combine multimodal document reading, 1M context for the paper plus code plus experiment logs, and the agent capabilities needed to drive 12 hours of unsupervised iteration.

The FP8 GEMM kernel optimization. On NVIDIA Hopper, M3 was given a Triton skeleton that could not run, a benchmark script, a task description, and no reference implementation. Over 24 hours of continuous execution, M3 made 147 benchmark submissions and 1,959 tool calls, and improved Hopper FP8 hardware peak utilization from 7.6% to 71.3% — a 9.4x speedup. Most other models stopped making progress within the first 30 submissions and exited. M3's best solution appeared on submission 145. That is the agentic behavior production teams need: the willingness to keep iterating past the point where a weaker model would give up.

Pricing And Distribution

OpenRouter lists MiniMax M3 at $0.30/M input and $1.20/M output during the 50% launch promo, with standard pricing at $0.60/M input and $2.40/M output. Weights are on Hugging Face under the MIT license — no $20M revenue clause, no community-license restrictions, no API-only carveout.

Opus 4.7 is roughly $5/M input and $25/M output. M3 at launch promo is 16x cheaper on input and 20x cheaper on output than Opus 4.7, and within 2 points on SWE-Bench Pro. Open weights at 1/20th the cost, frontier coding, 1M context, multimodal, deployable in your own infrastructure.

What To Do With It Today

If you build production agents: download the M3 weights from Hugging Face, deploy on a single H100, and benchmark against your current closed-frontier model on your real harness. The 1M context lets you stop chunking long sessions. MSA lets you run the same load at a fraction of the GPU cost. The Claw-Eval leadership makes M3 worth testing for computer-use workflows in particular. If you are on the closed frontier: evaluate M3 for the 60-70% of your traffic that is routine agent execution, and reserve the closed budget for the 10% that requires Opus 4.8-class intelligence. The MIT license removes the procurement objection. The benchmark gap is narrow. The cost gap is enormous. If you are on a smaller open-weights model: test M3 against your current model on coding and long-context specifically. M3 is going to be the new default for the agent workloads most teams are running.

The Take

MiniMax M3 is the most consequential LLM release of the past seven days because it is the first open-weight release that genuinely changes the production math. The MSA architecture is the right primitive for the next two years of long-context agent work, and MiniMax just handed it to the open community on a permissive license.

The closed frontier is still ahead on raw intelligence. Kimi K2.6 is still ahead on absolute open-weights score. But the gap on coding closed to a couple of points, the gap on cost blew open by 20x, and the gap on license just disappeared. The "open weights cannot compete with closed" argument just lost its strongest data point.

— Mr. Technology

Release date: June 1, 2026. Architecture: MSA (MiniMax Sparse Attention), 1M-token context, native multimodal (text, image, video, computer use), 100T training tokens, mixed-modality from Step 0. Benchmarks: SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, SWE-fficiency 34.8%, KernelBench Hard 28.8%, MCP Atlas 74.2%, OmniDocBench above Gemini 3.1 Pro, SVG-Bench above Opus 4.7, Claw-Eval highest in field. Performance: 4x faster than Flash-Sparse-Attention/flash-moba, 1/20 per-token compute vs previous gen, 9x prefilling and 15x decoding speedup at 1M context, runs on a single H100. Pricing: $0.30/M input and $1.20/M output on OpenRouter launch promo (50% off), $0.60/$2.40 standard. License: MIT. Day-0 access: Hugging Face weights, MiniMax API, MiniMax Code, Token Plan, OpenRouter. Sources: MiniMax M3 release blog, OpenRouter pricing, The Decoder coverage.