
The most over-engineered pattern in AI right now is multi-agent orchestration. Every team I talk to is building a "supervisor agent" that delegates to worker agents, who delegate to sub-agents, who call tools, who spawn more agents, who keep a shared scratchpad in vector memory. It is the new microservices. The new Kubernetes. The new thing every platform engineer insists you need before you have a problem that justifies the complexity. You do not have that problem.
I have run production agent traces for about two years. The failure modes are remarkably consistent. The supervisor picks the wrong worker. A worker hallucinates a tool result. The shared scratchpad drifts from what the user actually asked. A retry loop burns 80,000 tokens to recover from a context corruption. The user gets a 14-minute response that answered a question they did not ask. Multi-agent systems compound errors geometrically, not linearly. Three agents in a chain, each 90% reliable, give you 0.9³ = 73% end-to-end. Six agents in a graph, each 90% reliable, give you 53%. The math punishes ambition.
LangGraph surpassed CrewAI in GitHub stars in early 2026 because every enterprise wanted "a graph." Most of those graphs are doing what a 200-line prompt with three tool calls would do, except slower, harder to debug, and four times the token cost. The framework is a feature, and the feature is the product, and the product is wrong.
A well-prompted frontier model with a clean tool list, a tight system prompt, and a real retry strategy will outperform a hand-orchestrated multi-agent system on at least 95% of the workflows I have seen pitched in 2026. The reasons are not mysterious.
A single agent has one context. There is no message-passing overhead. There is no shared-state reconciliation. There is no graph traversal to debug. You can read the trace end-to-end. You can reproduce a failure with a single prompt and a single seed. You can regression-test it. The model thinks across the whole task instead of being interrupted every turn by a dispatcher who may or may not have the latest context.
The 5% where multi-agent is genuinely correct is real: long-running research with parallelizable subtasks, adversarial verification where two agents check each other, and clean, well-typed handoffs between independently trained domain models. If you have not named which 5% you are in, you are not in the 5%.
The strongest defense of multi-agent is parallelization. Some tasks are genuinely slow because they are sequential, and a well-designed graph can fan out, run subtasks concurrently, and stitch results together. For long-horizon research, this matters. For a 12-step customer-support workflow, it does not. The cost of a parallel graph is not the compute. It is the cognitive load on the team that has to maintain it. Most teams cannot debug the graph they already have, let alone scale it to the next workflow.
If you are about to build a supervisor agent, stop. Ask whether a single agent with a longer context, a cleaner tool list, and a real retry strategy would solve the same problem. The answer is almost always yes. Build that first. Promote to multi-agent only when you have a specific, named, measured failure mode that a single agent cannot address — and write that failure mode down before you reach for the framework. Multi-agent orchestration is a tool, not an architecture. The teams that treat it as an architecture will spend 2026 building graphs they cannot ship. The teams that treat it as a tool will ship the agent in March and be in production by April.
— Mr. Technology
Posted June 9, 2026. One good agent beats a tangled graph. The complexity tax compounds.