One decorator wraps every Anthropic / OpenAI call and logs tokens, cost, and latency to a JSONL file. No per-call instrumentation, no forgotten prints, no surprise bill at the end of the month.

Track Every LLM Token With One Python Decorator (and Stop Guessing Your Bill)

You shipped a feature that calls Claude. Two weeks in your Anthropic bill is 4x what you projected and you cannot tell which endpoint or which prompt is the culprit. You added print(len(msg.content)) somewhere and forgot to remove it. This post is the fix: a single decorator that wraps every LLM call and writes a JSON receipt — tokens, cost, latency, model — for every invocation. Time-to-complete: 10 minutes.

Table of contents

The Setup (~30 sec)
The Decorator
Use It Everywhere
Why This Matters
What To Watch Out For
Variations

Hey guys, Mr. Technology here.

The Setup (~30 sec)

You need the Anthropic SDK and somewhere durable to ship logs. We append to a JSONL file — swap that for your logger of choice.

bash

pip install anthropic

The Decorator

Drop this in llm_track.py:

python

import anthropic, functools, json, time, os
from datetime import datetime, timezone
# $ per million tokens, (input, output)
PRICING = {
    "claude-opus-4-8":   (15.0, 75.0),
    "claude-sonnet-4-5": (3.0,  15.0),
    "claude-haiku-4-5":  (0.80, 4.0),
}
LOG_PATH = os.environ.get("LLM_LOG", "/tmp/llm_usage.jsonl")
def track_llm(call_name: str):
    def deco(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            t0 = time.perf_counter()
            msg = fn(*args, **kwargs)  # must return anthropic.Message
            dt = (time.perf_counter() - t0) * 1000
            in_t  = getattr(msg.usage, "input_tokens", 0)
            out_t = getattr(msg.usage, "output_tokens", 0)
            pin, pout = PRICING.get(msg.model, (0.0, 0.0))
            cost = (in_t * pin + out_t * pout) / 1_000_000
            rec = {
                "ts": datetime.now(timezone.utc).isoformat(),
                "call": call_name, "model": msg.model,
                "in": in_t, "out": out_t,
                "cost_usd": round(cost, 6),
                "ms": round(dt, 1),
            }
            with open(LOG_PATH, "a") as f:
                f.write(json.dumps(rec) + "\n")
            return msg
        return wrapper
    return deco

Use It Everywhere

Wrap every call. The decorator does not care about arguments, returns the Message untouched, and writes one JSON line per call.

python

import anthropic
from llm_track import track_llm
client = anthropic.Anthropic()
@track_llm("summarize_email")
def summarize_email(text: str):
    return client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=256,
        messages=[{"role": "user", "content":
            f"Summarize in one sentence:\n{text}"}],
    )
@track_llm("classify_intent")
def classify_intent(text: str):
    return client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=16,
        messages=[{"role": "user", "content":
            f"Classify intent of: {text}\nReply with one label."}],
    )

Run the app. Tail the log:

bash

tail -f /tmp/llm_usage.jsonl | jq

json

{"ts":"2026-06-16T18:42:01+00:00","call":"classify_intent",
 "model":"claude-haiku-4-5","in":312,"out":4,
 "cost_usd":0.000266,"ms":412.3}
{"ts":"2026-06-16T18:42:01+00:00","call":"summarize_email",
 "model":"claude-sonnet-4-5","in":2410,"out":88,
 "cost_usd":0.00855,"ms":1830.7}

You now know which call is burning money, on which model, at what latency.

Why This Matters

The decorator is the cheapest place to instrument. Every code path that talks to the model goes through the same gate. No more scattered print(len(content)) across five files.

Cost attribution comes from one field. Pass call_name="onboarding.step3.extract_company" instead of "summarize". Group by call in your log aggregator and you have a per-feature breakdown — the report you show the PM who wants the bill explained.

Latency is free. time.perf_counter() costs nothing. Compute p95 over the JSONL and you have a feature-level SLO you can alert on.

What To Watch Out For

Pricing drifts. The PRICING dict is wrong the moment a new model ships. Keep it in one file, log a startup line that prints its version.

Cached tokens. Anthropic and OpenAI charge reduced rates for cache hits. The SDK exposes cache_creation_input_tokens and cache_read_input_tokens on msg.usage. Subtract those before multiplying, or you over-bill yourself on cache hits.

Streaming. client.messages.stream(...) returns a stream manager, not a Message. Wrap the call, then log final_message().usage after the loop.

Async. Swap def wrapper for async def wrapper and fn(...) for await fn(...). Same logic, one keyword.

Variations

Per-tenant cost. Capture tenant id in a ContextVar; have the decorator read it. One line, full multi-tenant billing.

Daily rollups. Crontab jq -s 'group_by(.call) | map({call: .[0].call, cost: (map(.cost_usd) | add)})' /tmp/llm_usage.jsonl. Cost-per-feature report in your inbox for free.

Alerts. Five-line script that greps the JSONL for any line above a threshold and posts to Slack — catches runaway loops before they become bills.

The Take

One decorator. One JSONL file. One source of truth. Every call writes its own receipt, so you never again wonder which prompt is melting the budget. Add it today, before you ship the next feature that calls the model.

— Mr. Technology

*Tested June 2026 with the anthropic Python SDK on claude-sonnet-4-5 and claude-haiku-4-5. PRICING is illustrative — check your provider's current rate card. The decorator is provider-agnostic: drop in any SDK that returns a .usage object with input_tokens and output_tokens.*