
If you've spent any time with LLM APIs, you've hit the same wall everyone does: single-shot prompts are great until they aren't. The model hallucinates, ignores edge cases, or just decides to do its own thing. Prompt chaining with function calling fixes most of that.
Function calling (also called tool use) lets you define structured outputs your AI can trigger. Chaining means you sequence these calls so each output feeds the next input. The result: workflows that are predictable, testable, and actually do what you expect.
**The Setup**
I'm going to assume you're using the Claude Messages API with the Anthropic SDK, but the pattern translates anywhere. First, define your tools. Here's a lookup function:
{
"name": "lookup_order",
"description": "Fetch order details by order ID",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string"}
},
"required": ["order_id"]
}
}
Register it in your API call under `tools`. Here's where the chaining happens:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=[lookup_order, update_status, send_notification],
messages=[{"role": "user", "content": user_input}]
)
The model will return `stop_reason: "tool_use"` if it wants to call something, or `content` if it's done. You handle each tool call, feed the result back as a new message, and keep going until you get a text response.
**The Chaining Pattern**
Here's where it gets interesting. Instead of one big prompt that does everything, chain specialized prompts:
def process_user_request(input_text):
extracted = claude.run(
system="Extract order_id and action from text",
tools=[lookup_order],
messages=[{"role": "user", "content": input_text}]
)
order_data = None
if extracted.tool_calls[0].name == "lookup_order":
order_data = lookup_order(extracted.tool_calls[0].args["order_id"])
return claude.run(
system="Determine next action based on order status",
messages=[{"role": "user", "content": str(order_data)}]
)
Each step has one job. The model isn't juggling classification, execution, and formatting simultaneously. Failure becomes granular—if something breaks, you know exactly where.
**Why This Beats Monolithic Prompts**
Splitting into focused prompts reduces model cognitive load, which translates directly to fewer hallucinations. You get better adherence to your output schema. Testing becomes trivial because you can validate each link in the chain independently. Debugging is cleaner—you can inspect intermediate outputs without hunting through a 2000-token prompt.
The tradeoff: more round trips means higher latency and token usage. Your orchestration code gets more complex. But for anything beyond toy examples, the reliability gains are worth it.
Start with two-step chains. Classify, then execute. Once that works, add a validation step, then a routing step. Before you know it, you've got a proper workflow that doesn't need hand-holding.
That's prompt chaining. Not glamorous, but it works.