Let me start with the thing nobody else is leading with: Cohere just released a 218B parameter mixture-of-experts model under Apache 2.0, and the discourse is treating it like a footnote.
That's wrong. Command A+ (May 20, 2026) is the most consequential open-weight release since DeepSeek V4, and the fact that it's not getting the attention it deserves is probably because Cohere doesn't have the brand recognition of an OpenAI or Anthropic. But let's be clear about what just happened: a well-funded, technically serious lab just put a production-grade, frontier-adjacent model with tool use, multimodal understanding, and 48-language support into the hands of anyone who wants to run it, modify it, and deploy it without asking permission.
That's not a footnote. That's a problem for every closed API provider charging a premium for capabilities you can now self-host.
Command A+ is a sparse MoE model — 218B total parameters, 25B active per token. That active count is what matters for inference cost. You get frontier-adjacent reasoning at a fraction of the active parameter count, which means the cost per token is dramatically lower than a dense model of equivalent quality.
The headline specs:
The model consolidates what used to be four separate specialized Command models — Command A, Command A Reasoning, Command A Vision, Command A Translate — into a single unified model. That's significant from a deployment standpoint. One model to deploy, one model to maintain, one model that handles reasoning AND vision AND translation AND tool use. That's the direction the industry is moving, and Cohere got there before most of the big-name labs did with their open weights.
I want to be precise about what open weights actually means here, because the term is used loosely. Command A+ is genuinely open. Apache 2.0 means you can use it commercially, modify it, fork it, quantize it, deploy it on your own infrastructure, and nobody is going to come after you with a licensing team. The weights are on Hugging Face. The quantization options are already available. If you have the hardware, you can run this without touching Cohere's infrastructure at all.
This matters for a specific reason: enterprise teams with data sovereignty requirements, healthcare companies dealing with PHI, financial services firms with regulatory constraints — these organizations have been stuck paying premium API prices because the open-weight models available weren't good enough for their use cases. Command A+ changes that calculus. It's not just good enough for internal tooling. It's good enough for production workloads that require the model to operate on sensitive data that can't leave their infrastructure.
The quantization support is particularly important here. Cohere explicitly mentions "near lossless quantizations" available on Hugging Face. For teams running on-premises, that means you can get the model down to sizes that run on hardware configurations that were previously insufficient, without the quality degradation that typically comes with aggressive quantization.
I want to zoom in on the tool use capability, because I think this is the most underrated aspect of the release. Command A+ has tool use built into the base model. Not as an afterthought, not as a fine-tuning add-on — as a first-class capability that works out of the box.
What does that mean in practice? The model can receive tool definitions in its context, reason about which tool to call, execute the call, and incorporate the result into its next reasoning step. This is the primitive that every agentic architecture is built on, and most open-weight models either don't have it or require significant fine-tuning to get decent performance.
Command A+ ships it in the base model. For developers building agentic workflows — which is essentially every developer building with LLMs in 2026 — this is the difference between spending two weeks fine-tuning a model to do tool calling reliably and spending two hours integrating a model that does it out of the box.
The practical implication: if you're building a coding agent, a research synthesis pipeline, an automation workflow, or anything that involves LLMs interacting with external systems, Command A+ gives you a foundation that previously required either GPT-4o or Claude Sonnet 4, both of which are closed API models with usage costs that add up fast at scale.
I want to be honest about the MoE efficiency claims, because they're worth examining rather than just accepting. Sparse MoE models like Command A+ are cheaper to serve per token than dense models of comparable quality because only a fraction of the parameters are active for any given token. The routing mechanism decides which experts handle which tokens, and the rest of the model sits idle.
The 25B active / 218B total ratio means you're running about 11% of the model's parameters per token. That's where the cost savings come from — and it's real, assuming the routing works well in practice. The challenge with MoE models has historically been that expert load balancing can become uneven, leading to some experts being overloaded while others sit mostly idle, which degrades quality on certain types of inputs.
I haven't seen independent benchmarks on Command A+'s routing quality yet, so I'm going to hold judgment on whether the 25B active count delivers quality consistent with the 218B total parameter count. The benchmark claims from Cohere show strong performance, but I've learned to treat vendor benchmarks with appropriate skepticism.
What I can say: the architectural direction is sound. MoE is the right choice for anyone who wants frontier quality at lower inference cost, and Command A+ is the most capable open-weight MoE model I've seen from a commercial lab with a permissive license.
The open-weight frontier has gotten genuinely crowded in 2026. Llama 4, DeepSeek V4, Gemma 4, Qwen 3.x, ZAYA1-8B — the open-weight ecosystem is producing capable models at a pace that would have seemed impossible two years ago. Command A+ lands into that environment with some distinct advantages.
First-mover advantage on tool use integration: most open-weight models don't have reliable tool use out of the box. You can get it through fine-tuning or through prompting frameworks, but it's not native. Command A+ ships it natively.
Multilingual breadth: 48 languages is genuinely wide for a model this capable. Most open-weight models are strong in English and reasonable in a handful of European languages, then fall off significantly for Asian languages, Arabic, and lower-resource languages. If you're building multilingual products, the 48-language support is a real differentiator.
Apache 2.0 clarity: the licensing situation in the open-weight world is still a bit of a minefield. Some models that claim to be open have restrictions that create legal exposure for commercial use. Apache 2.0 is clean. It's the most permissive mainstream open-source license. If your legal team has been nervous about using open-weight models commercially, Apache 2.0 removes that objection.
Two things I want to see before I'd recommend this as a default recommendation for production workloads:
Independent benchmark validation. The performance claims are strong, but I want to see how it performs on real-world tasks — particularly agentic tool use, long-context reasoning, and multilingual quality — against the models it claims to compete with. Cohere has a good track record, but the jump from specialized models to a unified model is significant, and the unified model needs to prove it matches the specialists it replaced.
Community fine-tunes. One of the things that makes open-weight models valuable is the fine-tuning ecosystem that grows around them. I want to see what the community does with Command A+ — particularly quantized versions for consumer hardware, domain-specific fine-tunes, and RLHF wrappers that improve the model's alignment for specific use cases. The Apache 2.0 license means the fine-tuning ecosystem has no legal friction, which should accelerate development.
Command A+ matters because it's the right model at the right time with the right license. The open-weight ecosystem needed a production-grade agentic model with tool use, multimodal understanding, and broad language support that enterprises could actually deploy without legal review. Cohere delivered that on May 20, 2026, and the fact that it's not dominating the discourse is more of a statement about the media's attention economy than about the model's significance.
If you're building agentic workflows, running multilingual products, or operating in an environment where data sovereignty prevents you from using closed APIs, Command A+ is worth your attention. Download the weights, spin up a vLLM instance, and see how it performs on your workload. The only cost is the hardware.
That's a sentence I couldn't have written twelve months ago about a model of this capability. The open-weight frontier just got more competitive, and that's good for everyone building with AI.
*Command A+ is available on Hugging Face at CohereLabs/command-a-plus-05-2026-w4a4. Apache 2.0 license. No API costs, no usage limits, no data sharing.*