← Back to Payloads
Tutorial2026-06-16

Track Every LLM Token With One Python Decorator (and Stop Guessing Your Bill)

One decorator wraps every Anthropic / OpenAI call and logs tokens, cost, and latency to a JSONL file. No per-call instrumentation, no forgotten prints, no surprise bill at the end of the month.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
Track Every LLM Token With One Python Decorator (and Stop Guessing Your Bill)

Track Every LLM Token With One Python Decorator (and Stop Guessing Your Bill)

You shipped a feature that calls Claude. Two weeks in your Anthropic bill is 4x what you projected and you cannot tell which endpoint or which prompt is the culprit. You added print(len(msg.content)) somewhere and forgot to remove it. This post is the fix: a single decorator that wraps every LLM call and writes a JSON receipt — tokens, cost, latency, model — for every invocation. Time-to-complete: 10 minutes.

Table of contents

  • The Setup (~30 sec)
  • The Decorator
  • Use It Everywhere
  • Why This Matters
  • What To Watch Out For
  • Variations

Hey guys, Mr. Technology here.

The Setup (~30 sec)

You need the Anthropic SDK and somewhere durable to ship logs. We append to a JSONL file — swap that for your logger of choice.

bash pip install anthropic

The Decorator

Drop this in llm_track.py:

```python import anthropic, functools, json, time, os from datetime import datetime, timezone

$ per million tokens, (input, output)

PRICING = { "claude-opus-4-8": (15.0, 75.0), "claude-sonnet-4-5": (3.0, 15.0), "claude-haiku-4-5": (0.80, 4.0), } LOG_PATH = os.environ.get("LLM_LOG", "/tmp/llm_usage.jsonl")

def track_llm(call_name: str): def deco(fn): @functools.wraps(fn) def wrapper(*args, **kwargs): t0 = time.perf_counter() msg = fn(*args, **kwargs) # must return anthropic.Message dt = (time.perf_counter() - t0) 1000 in_t = getattr(msg.usage, "input_tokens", 0) out_t = getattr(msg.usage, "output_tokens", 0) pin, pout = PRICING.get(msg.model, (0.0, 0.0)) cost = (in_t pin + out_t * pout) / 1_000_000 rec = { "ts": datetime.now(timezone.utc).isoformat(), "call": call_name, "model": msg.model, "in": in_t, "out": out_t, "cost_usd": round(cost, 6), "ms": round(dt, 1), } with open(LOG_PATH, "a") as f: f.write(json.dumps(rec) + "\n") return msg return wrapper return deco ```

Use It Everywhere

Wrap every call. The decorator does not care about arguments, returns the Message untouched, and writes one JSON line per call.

```python import anthropic from llm_track import track_llm

client = anthropic.Anthropic()

@track_llm("summarize_email") def summarize_email(text: str): return client.messages.create( model="claude-sonnet-4-5", max_tokens=256, messages=[{"role": "user", "content": f"Summarize in one sentence:\n{text}"}], )

@track_llm("classify_intent") def classify_intent(text: str): return client.messages.create( model="claude-haiku-4-5", max_tokens=16, messages=[{"role": "user", "content": f"Classify intent of: {text}\nReply with one label."}], ) ```

Run the app. Tail the log:

bash tail -f /tmp/llm_usage.jsonl | jq

json {"ts":"2026-06-16T18:42:01+00:00","call":"classify_intent", "model":"claude-haiku-4-5","in":312,"out":4, "cost_usd":0.000266,"ms":412.3} {"ts":"2026-06-16T18:42:01+00:00","call":"summarize_email", "model":"claude-sonnet-4-5","in":2410,"out":88, "cost_usd":0.00855,"ms":1830.7}

You now know which call is burning money, on which model, at what latency.

Why This Matters

The decorator is the cheapest place to instrument. Every code path that talks to the model goes through the same gate. No more scattered print(len(content)) across five files.

Cost attribution comes from one field. Pass call_name="onboarding.step3.extract_company" instead of "summarize". Group by call in your log aggregator and you have a per-feature breakdown — the report you show the PM who wants the bill explained.

Latency is free. time.perf_counter() costs nothing. Compute p95 over the JSONL and you have a feature-level SLO you can alert on.

What To Watch Out For

Pricing drifts. The PRICING dict is wrong the moment a new model ships. Keep it in one file, log a startup line that prints its version.

Cached tokens. Anthropic and OpenAI charge reduced rates for cache hits. The SDK exposes cache_creation_input_tokens and cache_read_input_tokens on msg.usage. Subtract those before multiplying, or you over-bill yourself on cache hits.

Streaming. client.messages.stream(...) returns a stream manager, not a Message. Wrap the call, then log final_message().usage after the loop.

Async. Swap def wrapper for async def wrapper and fn(...) for await fn(...). Same logic, one keyword.

Variations

Per-tenant cost. Capture tenant id in a ContextVar; have the decorator read it. One line, full multi-tenant billing.

Daily rollups. Crontab jq -s 'group_by(.call) | map({call: .[0].call, cost: (map(.cost_usd) | add)})' /tmp/llm_usage.jsonl. Cost-per-feature report in your inbox for free.

Alerts. Five-line script that greps the JSONL for any line above a threshold and posts to Slack — catches runaway loops before they become bills.

The Take

One decorator. One JSONL file. One source of truth. Every call writes its own receipt, so you never again wonder which prompt is melting the budget. Add it today, before you ship the next feature that calls the model.

Mr. Technology


*Tested June 2026 with the anthropic Python SDK on claude-sonnet-4-5 and claude-haiku-4-5. PRICING is illustrative — check your provider's current rate card. The decorator is provider-agnostic: drop in any SDK that returns a .usage object with input_tokens and output_tokens.*

Related Dispatches