
Let me say it clearly: function calling is the most overrated capability in modern LLMs, and the industry has built the entire "agentic AI" stack on top of it. That bet is going to age badly.
I have watched too many engineering teams ship production systems on function calling. Every one hit the same wall: function calling is the model's way of failing in ways that look exactly like success. The tool gets called. The parameters get passed. The system does the thing. And the thing is wrong in a way no test will catch until the customer does.
The pitch was clean. Give the model a JSON schema describing your tools, and it will pick the right tool, with the right parameters, at the right time. A deterministic interface between a probabilistic system and the deterministic world.
What we got is a probabilistic system that calls the right tool roughly 95% of the time, with parameters that are roughly 95% right, and the other 5% of the time it calls the right tool with parameters that are confidently wrong. The model does not ask for clarification. It does not hesitate. It executes. The schema is not a contract. It is a suggestion the model has been trained to respect most of the time.
This is not a deterministic interface. It is a probabilistic interface wearing a deterministic costume.
I have read more production function-calling traces than I care to admit. The model calls `send_email` with the recipient swapped from a different conversation. It calls `query_database` with a hallucinated table name smuggled into the filter clause. It calls `update_user_record` with a user ID that exists but belongs to a different tenant. Every call is syntactically valid. Every call does real damage.
The industry response is always the same: add more validation, add more guardrails, add a second model to verify the first model's tool calls. We have built an entire secondary industry of function-call validators. A multi-billion dollar scaffolding layer to fix the problems function calling was supposed to solve.
That is the smell. A real primitive does not require another primitive to babysit it. Function calling did. That is the definition of a crutch.
Language models are not function callers. They are text generators trained on examples of function calls. There is a meaningful difference, and the industry has spent three years pretending there isn't.
A real function-calling system understands the tool, the parameters, and the consequences of invoking it. A language model has been trained on millions of JSON blobs that look like function calls and produces output that statistically resembles good function calls. When the situation is novel, the model falls back to what looks like a function call rather than what actually is one.
This is fine for demos. This is a disaster for the "agentic" workflows being sold to enterprises as the future of automation.
Less of it. Fewer tools. Narrower scope. Deterministic code paths for the things that should be deterministic, and language models for the things that benefit from language understanding.
The teams that stopped trying to build agents with twenty tools and started building systems with one or two carefully scoped tool calls per task are shipping reliable products. The teams still trying to build the twenty-tool agent are debugging hallucinations at 3am and rebuilding their validation layer for the third time this quarter.
Build the deterministic scaffolding first. Add language understanding inside the system, not as the spine. The agentic future is not a pile of function calls. It is carefully scoped language understanding wrapped in a deterministic shell.
Function calling is a crutch. The industry has bet the agentic future on it, and the bill is coming due. Stop building on top. Build the scaffolding. Add language understanding where it actually helps. The teams that figure this out will ship. The teams that don't will keep writing validator layers and calling it progress.