
Let me give you the tl;dr first because every team I've worked with finds out their LLM bill is 3-4x what they estimated after the charge hits: spin up Langfuse locally with docker compose, point the OpenAI/Anthropic SDK at it with two env vars, then query the events table to render per-feature USD cost. The whole thing works against any model — hosted or local — and the data never leaves your VPC.
You can see token counts in your logs, but you can't answer "what did feature X cost us last week, broken down by model". Langfuse fixes that in about 15 minutes if you skip the SDK zoo and just use the HTTP /api/public/ingestion endpoint.
```bash git clone https://github.com/langfuse/langfuse.git cd langfuse
docker compose up -d
curl -fsS http://localhost:3000/api/public/health
`
Open http://localhost:3000, create a project, copy the Secret Key and Public Key from Settings → API Keys. You will use these as env vars.
Drop this wrapper in front of every completion. Works with OpenAI, Anthropic, vLLM, Ollama — anything that takes a model name and returns tokens.
```python
import os, time, requests, tiktoken from functools import wraps
LANGFUSE = "http://localhost:3000" PRICES = { # USD per 1M tokens, update monthly "gpt-4o": {"in": 2.50, "out": 10.00}, "gpt-4o-mini": {"in": 0.15, "out": 0.60}, "claude-sonnet-4": {"in": 3.00, "out": 15.00}, "local-llama-70b": {"in": 0.00, "out": 0.00}, }
def track(feature: str): def deco(fn): @wraps(fn) def wrap(*a, **kw): t0 = time.perf_counter() result = fn(*a, **kw) enc = tiktoken.encoding_for_model("gpt-4o").encode tin, tout = len(enc(str(a))), len(enc(result)) p = PRICES.get(kw.get("model", "gpt-4o-mini"), PRICES["gpt-4o-mini"]) cost = (tin p["in"] + tout p["out"]) / 1_000_000 requests.post(f"{LANGFUSE}/api/public/ingestion", json={ "batch": [{ "id": f"{feature}-{int(time.time()*1000)}", "type": "generation-create", "body": {"name": feature, "model": kw.get("model"), "usage": {"input": tin, "output": tout, "total": tin+tout, "unit": "TOKENS"}, "metadata": {"cost_usd": cost}, "startTime": t0, "endTime": time.perf_counter()} }]}, auth=(os.environ["LANGFUSE_PUBLIC_KEY"], os.environ["LANGFUSE_SECRET_KEY"]), timeout=2) return result return wrap return deco ```
Use it like a decorator on any function that calls an LLM. The cost field is a metadata blob — Langfuse doesn't enforce a price schema, which is the whole point.
Langfuse's UI is fine, but a real cost report needs SQL. Hit Postgres directly:
sql -- per-feature USD cost, last 7 days SELECT metadata->>'cost_usd' AS cost, count(*) AS calls, sum((usage->>'total')::int) AS tokens FROM observations WHERE start_time > now() - interval '7 days' GROUP BY name ORDER BY cost::numeric DESC;
Pipe that into Metabase, Grafana, or just a daily Slack ping. Numbers reconcile to the penny with the provider invoice.
The async requests.post in the wrapper is fire-and-forget — a request drop on the Langfuse side is silent. For prod, swap to the official langfuse SDK with batching, or add a fallback log line. Also: the price table is your problem. Set a calendar reminder to refresh it when models change.
*Repo / references: github.com/langfuse/langfuse, self-hosting docs at langfuse.com/self-hosting, schema for /api/public/ingestion is in the OpenAPI spec under inference/.*