
Hey guys, Mr. Technology here. On Tuesday, June 30, Meituan — a company you know primarily for the orange-and-yellow food-delivery scooters clogging every Beijing intersection — open-sourced LongCat-2.0, a 1.6-trillion-parameter Mixture-of-Experts LLM. Native 1M-token context. MIT license. Trained end-to-end on a 50,000-card cluster of domestic Chinese ASICs. No Nvidia in the training loop, no Nvidia in the serving loop. The press treated this as a chip story. It is not. It is a pricing story. Let me show you why.
LongCat-2.0 is a sparse MoE with 1.6T total parameters and ~48B active per token, with dynamic activation ranging from 33B to 56B depending on query complexity. Meituan calls the activation system "Zero-Compute Experts" — routine tokens pass through lighter subnetworks, so dense-model idle overhead never materializes.
The context window is one million tokens, native, not extrapolated. To make 1M usable without choking memory, Meituan shipped LongCat Sparse Attention (LSA), an evolution of DeepSeek Sparse Attention with three orthogonal optimizations: Streaming-aware Indexing (coalesced HBM reads), Cross-Layer Indexing (amortize one index pass across adjacent layers via cross-layer distillation), and Hierarchical Indexing (coarse-to-fine two-stage recall).
They also bolted on an N-gram Embedding module: 135B extra parameters sitting orthogonal to the MoE expert layout, encoding dense 5-gram token relationships. Embedding space expands roughly 100x, large-batch inference gets faster, memory I/O bottlenecks shrink. It is the kind of architectural add-on you do when you actually understand your serving cost structure.
LongCat-2.0 is positioned for agentic coding. The headline numbers from the Meituan tech blog:
| Benchmark | LongCat-2.0 | Comparison |
|---|---|---|
| SWE-bench Pro | 59.5 | GPT-5.5 = 58.6 |
| Terminal-Bench 2.1 | 70.8 | — |
| SWE-bench Multilingual | 77.3 | — |
| FORTE (corporate workflow sim) | 73.2 | — |
Beating GPT-5.5 on SWE-bench Pro by 0.9 absolute is a real number. Meituan claims parity with Gemini 3.1 Pro, which is a different kind of claim (capability spread across more benchmarks, not single-benchmark dominance). For an open-weight MIT-licensed model, those numbers put LongCat-2.0 in the top tier of agentic coding models available to download.
Here is the part the Western press missed. For two months before the unmasking, an anonymous model called "Owl Alpha" was eating OpenRouter's global leaderboard. By the time Meituan stepped forward, Owl Alpha was at roughly 10.1 trillion tokens per month, ~559B tokens per day, up 242% month-over-month, ranking third globally across all categories. Number one on Hermes Agent workspaces, second on Claude Code deployments, third across international OpenClaw environments.
The community had been benchmarking it for weeks. Early comparisons put it near Qwen3.6-27B class on coding tasks. Nobody knew who built it. The fact that it stayed anonymous through that volume is itself a story — most stealth models get identified within days. Whoever was running it knew how to keep a server quiet.
Meituan launched LongCat-2.0 with a two-tier API:
That promo price undercuts MiniMax-M3, DeepSeek V4 Flash, and Gemini 3.1 Flash-Lite on input, and matches them on output. For a 1.6T model with 1M context and SWE-bench Pro > GPT-5.5, this is not aggressive pricing. It is a war crime against the closed-source enterprise tier.
Look at the comparable frontier table:
| Model | Input $/1M | Output $/1M |
|---|---|---|
| LongCat-2.0 (promo) | $0.30 | $1.20 |
| DeepSeek V4 Pro | $0.435 | $0.87 |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 |
| Claude Sonnet 5 | $2.00 | $10.00 |
| GPT-5.6 Sol | $5.00 | $30.00 |
| Claude Fable 5 / Mythos | $10.00 | $50.00 |
A model that benchmarks near Gemini 3.1 Pro, runs 1M context, and is priced below every Western frontier offering. Cache hits free. The math for any team running agentic workloads at scale just got rewritten. If you are paying Claude Fable 5 prices for long-running agentic coding tasks today, you have no excuse in two weeks.
The training claim is the geopolitically loud one: 50,000 domestic ASICs, full end-to-end — pre-training and inference — with zero Nvidia in either loop. Compare to DeepSeek V4 Pro, which used domestic silicon only for inference, with pre-training still on restricted-export hardware.
End-to-end pre-training on non-Nvidia silicon at 1.6T scale is the milestone Washington was specifically trying to prevent. Export controls were designed to keep China on the inference tier — let them run trained models, but starve them of the compute to train frontier ones. LongCat-2.0 is Meituan saying that ceiling has been crossed, at least on a single large training run.
Caveat: the training claim rests on Meituan's account of its own infrastructure. The community can verify model quality by running the weights, but the silicon claim can only be audited by people inside Meituan's cluster, and they are not opening that up. Take it as direction-of-travel evidence, not audited fact.
Meituan is a food-delivery company. That is the line that should rattle you. A food-delivery company just trained a 1.6T MoE on a 50,000-chip domestic cluster, beat GPT-5.5 on SWE-bench Pro, open-sourced it under MIT, priced it at a tenth of Claude Fable 5, and watched it go top-3 on OpenRouter while still anonymous. The companies whose entire pitch is "China is X years behind on frontier AI" need a new pitch.
The export-control era assumed that frontier training was bottlenecked on Nvidia. If Chinese labs can replicate the frontier on domestic ASICs, the export controls do not slow China down — they just accelerate the development of a parallel compute stack that, once mature, competes with Nvidia on price everywhere it is allowed to sell.
LongCat-2.0 is not the end of that story. It is the first public paragraph. Read the weights when they drop fully. Run the benchmarks yourself. And update your cost projections for agentic workloads this week, not next quarter.
— Mr. Technology
*Released: June 30, 2026, by Meituan. Models: LongCat-2.0 MoE — 1.6T total parameters, ~48B active per token (33B–56B dynamic), native 1M-token context, MIT license. Architecture: Zero-Compute Experts, LongCat Sparse Attention with Streaming-aware / Cross-Layer / Hierarchical Indexing, 135B-parameter N-gram Embedding. Training: end-to-end on a 50,000-card domestic Chinese ASIC cluster (claim unverified by third parties); no Nvidia in training or serving. Post-training: MOPD (Multi-Teacher Optimization via Mixture of Specialized Experts). Benchmarks: SWE-bench Pro 59.5, Terminal-Bench 2.1 70.8, SWE-bench Multilingual 77.3, FORTE 73.2. Pricing: standard API $0.75 / $2.95 per 1M tokens in/out; limited-time promo $0.30 / $1.20; cache hits free. Open-source: architecture, eval, and inference code on GitHub and Hugging Face under meituan-longcat; full safetensors flagged "coming soon" at launch. Sources: VentureBeat — Meituan open sources LongCat-2.0, LongCat AI blog, SCMP — biggest AI model trained on local chips, Reuters, Hugging Face — meituan-longcat/LongCat-2.0.*