Hugging Face dropped smolagents in December 2024, which in AI time is approximately three years ago. But the framework has quietly matured into something worth talking about — not because it's flashy, but because it makes a specific architectural bet that most agent frameworks have quietly abandoned, and it's betting correctly.
Smolagents is a minimalist framework for building AI agents that execute actions by writing code. The core premise: instead of having an agent output a JSON tool call — `{"tool": "search", "query": "..."}` — you give it the ability to generate and execute Python code directly. The agent writes its own actions in code, then runs them.
That's the entire differentiation. And it's the right one.
The standard tool-calling paradigm has a fundamental ceiling. You define a fixed set of tools, each with a schema, and the agent selects from that menu. What happens when the agent needs to do something you didn't anticipate? You add another tool. Then another. Then you've built a sprawling tool library that nobody fully understands, with edge cases that only surface in production.
Code agents don't have that problem. The agent can do anything Python can do, which is everything.
Smolagents ships three agent types:
**CodeAgent** — The flagship. Generates Python code, executes it in a sandboxed environment, reads the output, loops until the task is done. The code execution happens via a local Python interpreter (not just an LLM generating text that you then run manually). The agent sees the execution result and can respond to errors, revise, and continue.
**ToolcallAgent** — The traditional tool-calling agent for cases where you specifically want structured tool calls. Useful for environments where code execution isn't available or you'd prefer to constrain the agent's action space.
**ManagedAgent** — Wraps a sub-agent and lets a parent agent delegate tasks to it. Useful for multi-agent orchestration if you need it, though smolagents keeps this optional rather than enforcing it.
The code execution sandbox is worth understanding. CodeAgent doesn't execute arbitrary code on your system in the way a raw Python eval would — it runs in a subprocess with controlled imports and a restricted environment. The agent can call tools you've explicitly registered, but it can't just `import os` and start reading files unless you've given it an `os` tool.
This is the right tradeoff. You get the flexibility of code generation without turning the agent into a RCE vulnerability.
The smolagents blog post on the launch makes a point that should be obvious but apparently isn't: agency is a spectrum. Most systems that call themselves "agents" are doing something trivial — routing, classification, single tool calls. Real agency means the LLM controls the iteration: it decides what to do next, does it, reads the result, and decides again.
CodeAgent implements this loop cleanly:
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
agent = CodeAgent(
tools=[DuckDuckGoSearchTool()],
model=HfApiModel()
)
agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")
That's the whole agent. The model will search for the leopard's top speed, look up the length of Pont des Arts, calculate the time, and return the answer. If the search fails, it tries a different query. If the length data is ambiguous, it makes a reasonable assumption and states it.
The framework handles the loop, the tool execution, and the memory of what happened in prior steps. You write the agent definition and the task. The model does the rest.
**The simplicity is genuine.** The core logic is roughly 1,000 lines. You can read it. You can fork it. You can debug it without a degree in the framework's internal architecture. This is not true of LangChain, which has become its own complexity universe.
**The tool model is right.** Tools are Python functions decorated with `@tool`. The agent sees them, calls them, gets results back. No schema engineering, no JSON validation, no fighting with the framework to make it recognize your tool. If you can write a Python function, you can add a tool.
**Hugging Face ecosystem integration is seamless.** smolagents works well with HF's model hub, their inference API, and their existing libraries. If you're already in that ecosystem, the integration story is the best available for open-source agents.
**The code execution model is sound.** Sandboxed code execution with controlled tool access is the correct architecture for agents that need to do real work. It avoids both the paralysis of fixed tool sets and the danger of unrestricted execution.
**Production-grade observability is thin.** For a framework targeting real workloads, the logging and debugging story is underdeveloped. When a code agent takes an unexpected path, you want trace-level visibility into what code it generated, what the execution result was, and where it went off-script. smolagents has basic output, but it's not Datadog for agents yet.
**Error recovery is limited.** CodeAgent will retry on execution errors, but it doesn't have sophisticated recovery strategies for cases where the generated code has logical bugs — not syntax errors, but code that runs successfully and produces wrong answers. The loop will continue with wrong data. You'd need to add your own validation layer.
**The managed agent pattern feels unfinished.** The multi-agent coordination story in smolagents is the weakest part. It works, but it doesn't have the sophistication of frameworks built around multi-agent from the start (CrewAI, LangGraph's supervisor patterns). If you're building a system that requires complex delegation hierarchies, smolagents will leave you wanting.
**Documentation is improving but gaps remain.** The HF course on smolagents is good. The API docs are adequate. But the gap between "intro tutorial" and "production deployment" has sparse coverage, particularly around deployment patterns, scaling, and the operational side of running agents in production environments.
Use smolagents when:
Don't use it when:
Smolagents is the framework you build when you've used LangChain, AutoGen, and CrewAI and concluded that they all solved a different problem than the one you actually have. The problem most teams have is not "how do we orchestrate complex multi-agent conversations." It's "how do we get the model to actually do the thing instead of just describing what it would do."
Code execution is the honest answer to that question. Smolagents implements it simply, without the framework overhead that makes other tools feel like infrastructure projects. That's worth something.
The gaps — observability, error recovery, production ops — are real. But they're the kind of gaps you can fill yourself if the core architecture is sound. The opposite is also true: a framework with great observability and the wrong core model is just a well-monitored failure.
For teams that want agents that execute, not just agents that pontificate, smolagents is the right starting point. Tune it, instrument it, and ship it.
*smolagents is open source and available at [github.com/huggingface/smolagents](https://github.com/huggingface/smolagents). The framework is actively maintained; check the repo for the latest capabilities and release notes.*