
Auto-route tasks to the cheapest z.ai (GLM) model that works correctly. Three-tier progression: Flash → Standard → Plus/32B. Classify before responding.
pip install zai-router zai-router configure --provider zai
# Simple query → Flash zai-router route "What's the weather in Berlin?" # Analytical task → Plus/32B zai-router route "Analyze the tokenomics trends for the last 30 DeFi protocols by TVL"
| Model | Use When | Cost |
|---|---|---|
| Flash | Q&A, greetings, reminders, lookups | ~$0.001/1K tokens |
| Standard | Code generation, summaries, reasoning | ~$0.01/1K tokens |
| Plus/32B | Complex analysis, multi-step agents | ~$0.10/1K tokens |
| Pros | Cons |
|---|---|
| Significant cost savings at scale | Requires careful prompt classification |
| Preserves output quality for complex tasks | Flash may miss nuance in edge cases |
| Automatic tier escalation when needed | Some latency overhead from classification |
If you're building on z.ai, smart-model-routing-for-zai is mandatory. The cost difference between Flash and Plus/32B is 100x — and most user queries don't need the big model. Smart routing is the first thing you should ship.