On June 30, 2026, Huawei dropped openPangu-2.0-Flash — a 92B-total/6B-active MoE with 512K context, 34T pretraining tokens, and a non-Apache license. SWE-bench Verified 63.1%, LiveCodeBench V6 85.1%, AIME 2026 98.1% w/ Python. The training stack is the real news.

Huawei Just Open-Sourced a 92B/6B MoE on Ascend — And Gave Away the Training Ops

On June 30, 2026, Huawei dropped openPangu-2.0-Flash on Hugging Face and GitCode. Weights, inference code, and the training and inference operators are all up. This is not another "weights only" open-source drop. The training stack ships with it. That is the headline the launch coverage is sleepwalking past.

The numbers worth knowing:

Architecture: 92B total / 6B active MoE, MLA attention with DSA + SWA layered 1:2
Context: 512K tokens (4× Qwen 3.6-Plus, ~2× most open-weight MoEs)
Pretraining: ~34T tokens
Post-training: unified SFT (slow + fast thinking), multi-specialist RL, on-policy distillation
Hardware target: native Ascend NPU; Huawei claims up to 2× per-card throughput vs. comparable open models
License: OpenPangu Model License Agreement v2.0 — permissive, but not Apache 2.0. Read it.

The Flash SKU is the lead; openPangu-2.0-Pro (505B total / 18B active) follows in July, per Huawei's HDC 2026 roadmap.

The Price-Performance Story: Free, But With a Real Cost Number

There is no API price because you are the runtime. Self-hosting a 92B MoE on Ascend is the cost story. Independent community benchmarks put comparable 70B-class open-weight models around $0.30–$0.50 per MTok on US cloud GPUs (Fireworks, Together). Self-host on Ascend 910B/910C and the unit economics change: Huawei claims single-card throughput 2× mainstream open-source peers.

For comparison against published API-priced peers:

Model	Params (T/A)	Context	SWE-bench Verified	AIME 2026 (w/ Python)
openPangu-2.0-Flash (Thinking)	92B / 6B	512K	63.1%	98.1%
openPangu-2.0-Flash (Non-Thinking)	92B / 6B	512K	57.6%	—
Qwen 3.7-Plus	open MoE	1M	~58% (third-party)	~88% (third-party)
Claude Sonnet 5 (June 30)	closed	1M	~63.2% (Anthropic)	—

A 6B-active MoE matching Sonnet 5 on SWE-bench Verified and beating it on AIME 2026 with Python is not a "Chinese copy" headline. It is the first time an open-weight MoE has crossed the line on coding agent evals that matter to Western engineering teams.

The catch is the license. OpenPangu License v2.0 is permissive for research and commercial use but includes a use-based clause Huawei reserves the right to enforce — similar in spirit to the OpenRAIL family. The weights themselves are free; the legal envelope is not Apache.

The Technical Novelty: Three Things the Model Card Got Right

1. DSA + SWA 1:2 layer split, not a bolt-on. Most long-context models glue sliding-window attention on top of full attention and call it a day. Pangu interleaves sparse global aggregation (DSA) and local-window (SWA) in a 1:2 ratio so global layers only pay attention cost when it earns its keep. The README's claim: compute, memory footprint, and memory-access cost all drop on long-context inference without measurable accuracy loss. If this holds up at third-party eval, it is the most practical long-context trick of 2026.

2. 4-stream mHC residual topology. Standard transformers have one residual path. Pangu swaps that for four parallel residual streams, claiming better representation diversity and generalization. mHC is one of the more interesting architectural bets out of any Chinese lab in the last 12 months.

3. Three-head MTP for self-speculative decoding. Multi-Token Prediction drafts three future tokens per step from a single forward pass. Pangu trained the MTP heads with the main objective, so drafts are accurate enough to use as a built-in speculative decoder. On Ascend this compounds with the DSA/SWA efficiency win. On H100 it depends on how well the kernels port.

The optimizer choice — Muon instead of AdamW — is the bonus. Muon has been quietly producing better convergence in open training runs throughout 2026.

The Benchmark Sheet That Matters

Pulling from the model card (Thinking mode unless noted):

Benchmark	Metric	Flash-Thinking	Flash-Non-Thinking
AIME 2026	Avg@16	93.3	86.5
AIME 2026 w/ Python	Avg@16	98.1	—
HMMT Feb 2025	Avg@16	91.5	67.1
IMO-AnswerBench	Acc	76.5	62.3
GPQA-Diamond	Avg@4	83.7	79.8
LiveCodeBench V6	Avg@3	85.1	50.9
SWE-bench Verified	Avg@3	63.1	57.6
TAU2-Bench (agent)	Avg@3	88.0	74.0
MCP-Atlas (agent)	Acc	58.9	47.9
BrowseComp (agent)	Acc	57.0	—

The 34-point gap on LiveCodeBench V6 between Thinking and Non-Thinking (85.1 vs 50.9) is unusual. Most MoEs separate by 15–25 points. Pangu either baked reasoning into the slow path unusually well, or the Non-Thinking path is being measured too cold. Either way, Thinking mode is the one to benchmark.

Three Reads

Read 1: The export-control clock just sped up for everyone. Sonnet 5 launched on June 30 too, and Anthropic explicitly tuned it below the cyber-capability threshold that took Fable 5 down on June 12. The same week, Huawei shipped an open-weight 92B MoE with 98.1% AIME w/ Python and 88.0% TAU2-Bench — and gave away the training stack that produced it. The U.S. dual-use story is no longer "China is two years behind." It is "China is shipping open-weight frontier on hardware that does not depend on TSMC." Expect the next round of chip restrictions to widen Ascend-by-name.

Read 2: OpenPangu License v2.0 is the real precedent, not the weights. Apache 2.0 is the comfort blanket of the open-LLM era. Pangu just demonstrated a Chinese hyperscaler can ship frontier-tier weights under a permissive-but-not-Apache license and still land on every Hugging Face trending list within 48 hours. The license pattern is now more important than the parameter count. Watch Alibaba, ByteDance, and Baidu adopt the same template by Q4.

Read 3: 6B active is the new "small." DeepSeek shipped the template; Mistral iterated; Qwen and Pangu are now confirming. A 6B-active MoE on 92B of total capacity is matching models that spend 4–8× more compute per token. Anyone building a serving stack that cannot do MoE-routing well is paying a tax they no longer need to pay.

The Practical Take

If you are on Huawei Ascend, this is your Llama 4. Native kernels, native ops, the training stack that produced it. The 2× per-card throughput claim is the number to validate first.
If you are on H100/B200, port and benchmark before celebrating. DSA + SWA + mHC will need real kernel work. The weights drop is real, but the inference perf claim is Ascend-specific.
Read the LICENSE. OpenPangu v2.0 is not Apache 2.0. The use-based clause is the kind of thing legal will flag in a 200M-token-per-month deployment.
Thinking mode is the only mode that matters for agents. The 85.1 vs 50.9 LiveCodeBench V6 gap tells you the Non-Thinking path is not competitive on coding agents. Route Thinking by default; fall back only when latency dominates.
Pro is coming in July. 505B total / 18B active, same architecture. If you were waiting on a Qwen 3.7-Max–class open-weight model from a Chinese hyperscaler, wait three weeks.
The U.S.–China AI stack story now has two simultaneous inflection points. Sonnet 5 going deliberately below cyber threshold on June 30. OpenPangu going deliberately open-weight on June 30. Both moves are responses to the same regulatory regime. Don't read either in isolation.

OpenPangu-2.0-Flash is not the biggest LLM release of the week — Sonnet 5 is. But Flash is the most strategically significant release of the month: a frontier-tier open-weight MoE trained entirely on a non-NVIDIA stack, with the training operators to match, under a license the open-source community will spend the next quarter arguing about. The weights are free. The training stack changes the calculation. The hardware target tells you where this is all heading.

— Mr. Technology

*Released: June 30, 2026. Model: openPangu-2.0-Flash (92B total / 6B active MoE). Pricing: open weights, self-hosted. License: OpenPangu Model License Agreement v2.0 (permissive, not Apache 2.0). Context: 512K. Hardware target: native Ascend NPU; Huawei claims up to 2× per-card throughput vs. comparable open-weight peers. Pretraining: ~34T tokens. Architecture: MLA attention + DSA/SWA 1:2 layered + 4-stream mHC residuals + 3-head MTP + Muon optimizer. Benchmarks (Thinking mode): SWE-bench Verified 63.1%, LiveCodeBench V6 85.1%, AIME 2026 93.3 (98.1 w/ Python), GPQA-Diamond 83.7, TAU2-Bench 88.0. Sources: Hugging Face model card, AIBase news, Pandaily, Tech in Asia, Dealroom, AI Policy Daily, Latent Space AINews.*