You ship an LLM feature on Tuesday. On Thursday your CFO asks why the OpenAI bill is $4,200. You have no idea which user cost you the most, which prompt burned the tokens, or whether that latency spike was your code or OpenAI's. **Helicone** is a one-line proxy that fixes this. Point your OpenAI client at it, and every request gets logged, costed, and searchable. No SDK rewrite. No data warehouse. Ten minutes to useful.
Install the SDK (or skip it for pure HTTP):
pip install helicone
Then flip your client. One base URL, one header:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://oai.helicone.ai/v1", # <- the proxy
default_headers={
"Helicone-Auth": f"Bearer {os.environ['HELICONE_API_KEY']}",
},
)
Sign up at [helicone.ai](https://helicone.ai), copy the key, drop it in your env. That is the install. Nothing to deploy.
The proxy is a pass-through. Every request your client makes gets mirrored, parsed, and stored. You can attach metadata with **custom properties**, and this is where it gets useful.
resp = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "summarize this PDF"}],
extra_headers={
"Helicone-Property-User-Id": user.id,
"Helicone-Property-Feature": "pdf-summarizer",
"Helicone-Property-Env": "prod",
},
)
Those `Helicone-Property-*` headers become first-class filters in the dashboard. What did your **pdf-summarizer** feature cost last week? One click. Every 5xx for users in cohort B? One click. Token cost between `gpt-4.1` and `gpt-4.1-mini` for the same prompt? One click. Helicone logs request, response, latency, cost, and your custom properties, with retention controls and per-key rate limits.
Make ten calls with different `User-Id` headers and open the dashboard. The request stream populates in real time, with a per-user cost breakdown, latency histogram, and token-usage table by model. The **Requests** view shows the full prompt and response inline, redacted if you toggle PII scrubbing. The **Costs** view shows a per-feature, per-user, per-model breakdown exportable to CSV.
Add `Helicone-Cache-Enabled: true` and matching requests are served from cache at zero tokens. On a RAG-heavy workload this is a 30-50% cost cut in the first week.
Use **Helicone** when you need observability today and do not want to deploy anything. The 10-minute setup is real, the dashboard is useful, and the per-feature cost breakdowns will save the next "why is the bill $4,200" Slack thread. Skip it when you need **OpenTelemetry-native tracing for a multi-step agent**: use **Langfuse** for that, or the Helicone OTel exporter if you want both. For a one-week MVP that just needs to know what it is spending, this is the tool. I have deployed it on three production systems this quarter. None have regretted it.
— *Mr. Technology*