← Back to Payloads
Tutorial

Helicone, in 10 Minutes: Real LLM Observability Without Re-Writing Your Client

Helicone is a one-line OpenAI proxy that gives you request logs, per-user cost breakdowns, latency histograms, and prompt caching in 10 minutes. No SDK rewrite, no deploy. Here is the setup, the gotchas, and when to skip it for Langfuse.
Quick Access
Install command
$ mrt install tutorial
Browse related skills

Helicone, in 10 Minutes: Real LLM Observability Without Re-Writing Your Client

You ship an LLM feature on Tuesday. On Thursday your CFO asks why the OpenAI bill is $4,200. You have no idea which user cost you the most, which prompt burned the tokens, or whether that latency spike was your code or OpenAI's. **Helicone** is a one-line proxy that fixes this. Point your OpenAI client at it, and every request gets logged, costed, and searchable. No SDK rewrite. No data warehouse. Ten minutes to useful.

The Setup

Install the SDK (or skip it for pure HTTP):

pip install helicone

Then flip your client. One base URL, one header:

import os

from openai import OpenAI

client = OpenAI(

api_key=os.environ["OPENAI_API_KEY"],

base_url="https://oai.helicone.ai/v1", # <- the proxy

default_headers={

"Helicone-Auth": f"Bearer {os.environ['HELICONE_API_KEY']}",

},

)

Sign up at [helicone.ai](https://helicone.ai), copy the key, drop it in your env. That is the install. Nothing to deploy.

The Implementation

The proxy is a pass-through. Every request your client makes gets mirrored, parsed, and stored. You can attach metadata with **custom properties**, and this is where it gets useful.

resp = client.chat.completions.create(

model="gpt-4.1",

messages=[{"role": "user", "content": "summarize this PDF"}],

extra_headers={

"Helicone-Property-User-Id": user.id,

"Helicone-Property-Feature": "pdf-summarizer",

"Helicone-Property-Env": "prod",

},

)

Those `Helicone-Property-*` headers become first-class filters in the dashboard. What did your **pdf-summarizer** feature cost last week? One click. Every 5xx for users in cohort B? One click. Token cost between `gpt-4.1` and `gpt-4.1-mini` for the same prompt? One click. Helicone logs request, response, latency, cost, and your custom properties, with retention controls and per-key rate limits.

First Run

Make ten calls with different `User-Id` headers and open the dashboard. The request stream populates in real time, with a per-user cost breakdown, latency histogram, and token-usage table by model. The **Requests** view shows the full prompt and response inline, redacted if you toggle PII scrubbing. The **Costs** view shows a per-feature, per-user, per-model breakdown exportable to CSV.

Add `Helicone-Cache-Enabled: true` and matching requests are served from cache at zero tokens. On a RAG-heavy workload this is a 30-50% cost cut in the first week.

The Gotchas

  • **Streaming needs `stream=True`.** Without it the response buffers at the proxy and your latency quietly doubles. One-line fix, confusing symptom.
  • **Custom properties are case-sensitive and lazy-indexed.** The dashboard only filters on properties it has seen. If you ship `Helicone-Property-Plan` on Friday, you cannot filter old data on it. Forward-plan your property names.
  • **Do not double-wrap.** If you already route through **LiteLLM** or a custom gateway, do not stack Helicone in front. Helicone has a native LiteLLM integration; read the [docs](https://docs.helicone.ai) before layering.
  • **PII is your problem.** Full request and response are stored by default. Toggle `Helicone-Moderations` or scrub upstream if you are in a regulated vertical. HIPAA lives on the enterprise tier, not free.

The Take

Use **Helicone** when you need observability today and do not want to deploy anything. The 10-minute setup is real, the dashboard is useful, and the per-feature cost breakdowns will save the next "why is the bill $4,200" Slack thread. Skip it when you need **OpenTelemetry-native tracing for a multi-step agent**: use **Langfuse** for that, or the Helicone OTel exporter if you want both. For a one-week MVP that just needs to know what it is spending, this is the tool. I have deployed it on three production systems this quarter. None have regretted it.

— *Mr. Technology*