You shipped a feature that calls Claude. Two weeks in your Anthropic bill is 4x what you projected and you cannot tell which endpoint or which prompt is the culprit. You added print(len(msg.content)) somewhere and forgot to remove it. This post is the fix: a single decorator that wraps every LLM call and writes a JSON receipt — tokens, cost, latency, model — for every invocation. Time-to-complete: 10 minutes.
Table of contents
Hey guys, Mr. Technology here.
You need the Anthropic SDK and somewhere durable to ship logs. We append to a JSONL file — swap that for your logger of choice.
bash pip install anthropic
Drop this in llm_track.py:
```python import anthropic, functools, json, time, os from datetime import datetime, timezone
PRICING = { "claude-opus-4-8": (15.0, 75.0), "claude-sonnet-4-5": (3.0, 15.0), "claude-haiku-4-5": (0.80, 4.0), } LOG_PATH = os.environ.get("LLM_LOG", "/tmp/llm_usage.jsonl")
def track_llm(call_name: str): def deco(fn): @functools.wraps(fn) def wrapper(*args, **kwargs): t0 = time.perf_counter() msg = fn(*args, **kwargs) # must return anthropic.Message dt = (time.perf_counter() - t0) 1000 in_t = getattr(msg.usage, "input_tokens", 0) out_t = getattr(msg.usage, "output_tokens", 0) pin, pout = PRICING.get(msg.model, (0.0, 0.0)) cost = (in_t pin + out_t * pout) / 1_000_000 rec = { "ts": datetime.now(timezone.utc).isoformat(), "call": call_name, "model": msg.model, "in": in_t, "out": out_t, "cost_usd": round(cost, 6), "ms": round(dt, 1), } with open(LOG_PATH, "a") as f: f.write(json.dumps(rec) + "\n") return msg return wrapper return deco ```
Wrap every call. The decorator does not care about arguments, returns the Message untouched, and writes one JSON line per call.
```python import anthropic from llm_track import track_llm
client = anthropic.Anthropic()
@track_llm("summarize_email") def summarize_email(text: str): return client.messages.create( model="claude-sonnet-4-5", max_tokens=256, messages=[{"role": "user", "content": f"Summarize in one sentence:\n{text}"}], )
@track_llm("classify_intent") def classify_intent(text: str): return client.messages.create( model="claude-haiku-4-5", max_tokens=16, messages=[{"role": "user", "content": f"Classify intent of: {text}\nReply with one label."}], ) ```
Run the app. Tail the log:
bash tail -f /tmp/llm_usage.jsonl | jq
json {"ts":"2026-06-16T18:42:01+00:00","call":"classify_intent", "model":"claude-haiku-4-5","in":312,"out":4, "cost_usd":0.000266,"ms":412.3} {"ts":"2026-06-16T18:42:01+00:00","call":"summarize_email", "model":"claude-sonnet-4-5","in":2410,"out":88, "cost_usd":0.00855,"ms":1830.7}
You now know which call is burning money, on which model, at what latency.
The decorator is the cheapest place to instrument. Every code path that talks to the model goes through the same gate. No more scattered print(len(content)) across five files.
Cost attribution comes from one field. Pass call_name="onboarding.step3.extract_company" instead of "summarize". Group by call in your log aggregator and you have a per-feature breakdown — the report you show the PM who wants the bill explained.
Latency is free. time.perf_counter() costs nothing. Compute p95 over the JSONL and you have a feature-level SLO you can alert on.
Pricing drifts. The PRICING dict is wrong the moment a new model ships. Keep it in one file, log a startup line that prints its version.
Cached tokens. Anthropic and OpenAI charge reduced rates for cache hits. The SDK exposes cache_creation_input_tokens and cache_read_input_tokens on msg.usage. Subtract those before multiplying, or you over-bill yourself on cache hits.
Streaming. client.messages.stream(...) returns a stream manager, not a Message. Wrap the call, then log final_message().usage after the loop.
Async. Swap def wrapper for async def wrapper and fn(...) for await fn(...). Same logic, one keyword.
Per-tenant cost. Capture tenant id in a ContextVar; have the decorator read it. One line, full multi-tenant billing.
Daily rollups. Crontab jq -s 'group_by(.call) | map({call: .[0].call, cost: (map(.cost_usd) | add)})' /tmp/llm_usage.jsonl. Cost-per-feature report in your inbox for free.
Alerts. Five-line script that greps the JSONL for any line above a threshold and posts to Slack — catches runaway loops before they become bills.
One decorator. One JSONL file. One source of truth. Every call writes its own receipt, so you never again wonder which prompt is melting the budget. Add it today, before you ship the next feature that calls the model.
— Mr. Technology
*Tested June 2026 with the anthropic Python SDK on claude-sonnet-4-5 and claude-haiku-4-5. PRICING is illustrative — check your provider's current rate card. The decorator is provider-agnostic: drop in any SDK that returns a .usage object with input_tokens and output_tokens.*