← Back to Payloads
Tutorial2026-06-09

Build a Tool-Calling Loop in Pure Python (No Framework)

Every agent framework is a loop. Models stop, you run tools, you feed results back, repeat. Here is the whole agent runtime in ~40 lines of plain Python against the Anthropic API.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
Build a Tool-Calling Loop in Pure Python (No Framework)

Build a Tool-Calling Loop in Pure Python (No Framework)

Every agent framework — LangGraph, CrewAI, Pydantic AI, even OpenAI's Agents SDK — is a loop. The model stops. You execute tools. You feed results back. The model stops again, or you hit a max.

If you understand the loop, you can debug any framework. If you can write the loop, you can skip 90% of them for anything that fits in one file. Here is the whole thing in ~40 lines of Python against the Anthropic API.

The Mental Model

A tool-calling conversation is just an array of messages that grows each turn:

user: "What's the weather in Paris?" assistant: tool_use(get_weather, {city: "Paris"}) user: tool_result("22C, clear") assistant: text("It's 22°C and clear in Paris right now.")

The model is the only thing that decides what happens next. You are the courier: detect tool calls, run them locally, ship the results back, repeat.

The Code

```python import anthropic from typing import Any, Callable

client = anthropic.Anthropic()

TOOLS = [{ "name": "get_weather", "description": "Get current weather for a city in Celsius.", "input_schema": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], }, }]

HANDLERS: dict[str, Callable[..., str]] = { "get_weather": lambda city: f"22C, clear in {city}", # your real impl }

def run(prompt: str, model: str = "claude-sonnet-4-5", max_steps: int = 8) -> str: messages: list[dict[str, Any]] = [{"role": "user", "content": prompt}]

for _ in range(max_steps): msg = client.messages.create( model=model, max_tokens=1024, tools=TOOLS, messages=messages, )

if msg.stop_reason == "end_turn": return "".join(b.text for b in msg.content if b.type == "text").strip()

Replay the assistant turn before any tool_result

messages.append({"role": "assistant", "content": msg.content})

tool_results = [] for block in msg.content: if block.type == "tool_use": handler = HANDLERS[block.name] output = handler(**block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": output, }) messages.append({"role": "user", "content": tool_results})

return "max steps reached"

print(run("What's the weather in Paris?"))

-> "The weather in Paris is currently 22°C and clear."

`

That is the whole thing. No graph state, no abstractions, no magic.

The Three Things Frameworks Hide

1. Replay the assistant turn. Anthropic's API requires you to send the model's tool-use blocks back exactly as you received them, before any tool_result. Skip this and you get a 400. This is the single most common bug when people roll their own.

2. Parallel tool calls. A model can emit multiple tool_use blocks in one turn. Loop over msg.content, execute each, collect all results, send them in a single user turn. The SDK does not batch this for you.

3. Token cost compounds. Every tool result is in the context on the next turn — and every turn after that. A 2KB result paid for once becomes a tax for the rest of the conversation. If your tool returns a large blob, summarize before appending. This is the difference between a $0.04 task and a $4 one.

The Two Things to Add Before Production

A budget guard. A misbehaving tool can loop forever. Track cumulative input tokens per run() and bail at a hard cap. Sum msg.usage.input_tokens across turns — it is cumulative per call.

A structured error result. When a tool throws, do not raise. Return this instead:

python {"type": "tool_result", "tool_use_id": block.id, "content": str(e), "is_error": True}

The model reads the error, pivots, and tries again. Letting the exception bubble kills the loop and forces you to re-prompt from scratch.

The Take

The frameworks are not pulling a trick. They wrap this loop with persistence, retries, observability, and tool catalogs. If your agent fits in one process and one prompt, the framework is overhead. Write the loop, ship the feature, reach for a framework the day you need a second agent in a second process talking to a shared store. Not before.

Mr. Technology


*Tested June 2026 with the anthropic Python SDK on claude-sonnet-4-5. Stop-reason values used: end_turn (final answer), tool_use (call one or more tools), max_tokens (truncated — bump max_tokens or split the task). For an OpenAI version, the shape is identical: tool_calls instead of tool_use blocks, tool_call_id instead of tool_use_id, and the assistant message carries the tool_calls field directly.*

Related Dispatches