← Back to Payloads
Tutorial2026-07-02

Wire OpenTelemetry Tracing Into a Python LLM App, Pointed at Self-Hosted Langfuse, in Five Lines

Stop maintaining two tracing stacks. Five lines of OpenTelemetry setup pipes every LLM span — model, prompt, tokens — into self-hosted Langfuse via the OTLP HTTP endpoint, with auto-instrumentation for OpenAI and Anthropic.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
Wire OpenTelemetry Tracing Into a Python LLM App, Pointed at Self-Hosted Langfuse, in Five Lines

Wire OpenTelemetry Tracing Into a Python LLM App, Pointed at Self-Hosted Langfuse, in Five Lines

Every self-hosted LLM stack eventually needs traces that span the model call, the retrieval, the tool execution, and the agent loop. Most teams instrument with the Langfuse Python SDK, then realize six months later that their service mesh already speaks OpenTelemetry and they now have two tracing systems. The fix is to point an OTel exporter at self-hosted Langfuse and skip the SDK entirely.

This works because Langfuse v2.7+ is OpenTelemetry-compatible — it accepts spans over the standard OTLP HTTP endpoint at /api/public/otel/v1/traces, parses gen_ai.* and openinference.* attributes, and renders them in the same trace view as SDK-instrumented spans. Self-host: docker compose up -d from github.com/langfuse/langfuse, create a project, copy the public and secret keys from Settings → API Keys.

The Setup

python
# tracing.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
import base64, os
provider = TracerProvider(
    resource=Resource.create({ResourceAttributes.SERVICE_NAME: "support-agent"})
)
auth = base64.b64encode(
    f"{os.environ['LANGFUSE_PUBLIC_KEY']}:{os.environ['LANGFUSE_SECRET_KEY']}".encode()
).decode()
provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(
        endpoint="http://localhost:3000/api/public/otel/v1/traces",
        headers={"Authorization": f"Basic {auth}"},
    ))
)
trace.set_tracer_provider(provider)

Pin the b0 releases together: opentelemetry-distro[otlp]==0.48b0, opentelemetry-instrumentation-openai==0.27b0, opentelemetry-instrumentation-anthropic==0.27b0. Mixing with stable 1.x causes a Protobuf import mismatch you will debug for an hour.

Wrapping an LLM Call

Wrap your model call in a span annotated with GenAI semantic conventions. Langfuse parses the attributes and builds the model / token / prompt columns automatically:

python
tracer = trace.get_tracer(__name__)
def answer(question: str) -> str:
    with tracer.start_as_current_span("llm.call") as span:
        span.set_attribute("gen_ai.system", "anthropic")
        span.set_attribute("gen_ai.request.model", "claude-sonnet-4-5")
        span.set_attribute("gen_ai.prompt", question[:4000])
        rsp = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=1024,
            messages=[{"role": "user", "content": question}],
        )
        text = rsp.content[0].text
        span.set_attribute("gen_ai.completion", text[:4000])
        span.set_attribute("gen_ai.usage.input_tokens", rsp.usage.input_tokens)
        span.set_attribute("gen_ai.usage.output_tokens", rsp.usage.output_tokens)
        return text

Auto-Instrumenting OpenAI / Anthropic

If you do not want to wrap calls manually, contrib instrumentors hook the SDKs. Two lines after trace.set_tracer_provider(...):

python
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.anthropic import AnthropicInstrumentor
OpenAIInstrumentor().instrument()
AnthropicInstrumentor().instrument()

Restart. Every OpenAI / Anthropic SDK call from that point is a span — model, prompt, completion, tokens, latency — no edits at the call site.

Gotchas

Trace IDs need propagation across services. If support-agent calls retrieval-service over HTTP, install opentelemetry-instrumentation-requests and a server-side instrumentor so the traceparent header rides every call. Without it Langfuse shows two unrelated traces instead of one nested trace, and you lose the link between retrieval and the LLM call.

SDK spans and OTel spans do not merge. If a downstream library uses the Langfuse Python SDK (not OTel), its spans land on a different trace_id. Migrate the library to OTel or accept the split and link traces via langfuse_context.update_observation. This is the migration tax nobody warns you about.

Cost columns need the right attribute. Langfuse's per-model cost math reads gen_ai.usage.input_tokens / output_tokens and gen_ai.request.model. Auto-instrumentor fills tokens but not the model ID for local servers — running vLLM or Ollama through the OpenAI SDK, set gen_ai.request.model yourself or the cost column reads $0.00.

Batching drops the last batch on shutdown. BatchSpanProcessor flushes every 5 seconds. In a short script the final batch dies in the queue. Call trace.get_tracer_provider().force_flush() on exit, or you will debug phantom tracing gaps that are actually shutdown races.

When To Skip

If you use Langfuse as hosted SaaS for one Python service, the SDK is simpler. Use this when you have services in more than one language, when your service mesh already emits OTel, or when you want Langfuse as one of several OTel backends without forking the SDK.

Mr. Technology


*Tested July 2026 with Langfuse 3.36 OSS, opentelemetry-distro[otlp]==0.48b0, opentelemetry-instrumentation-openai==0.27b0, opentelemetry-instrumentation-anthropic==0.27b0. OTLP trace endpoint for self-hosted Langfuse is /api/public/otel/v1/traces, authenticated with HTTP Basic on public:secret keys.*

Related Dispatches