
After you ship the Langfuse cost dashboard, the next problem shows up by Wednesday: a 12-step agent fails on a user's run and the trace shows one observation with no way to find the broken step. By the end of this you will wrap any Python agent in @observe, emit a custom span per tool call, and score terminal outcomes so failures are filterable.
bash pip install langfuse openinference-instrumentation-openai export LANGFUSE_PUBLIC_KEY="pk-lf-..." export LANGFUSE_SECRET_KEY="sk-lf-..." export LANGFUSE_HOST="http://localhost:3000" # or your self-hosted URL
The openinference-instrumentation-openai package is the part most people skip, and it is the only reason traces show up automatically. Without it, @observe gives you a parent span with no LLM telemetry inside it.
@observe and Decorate Tool Calls (5 min)```python from langfuse import observe, get_client from openai import OpenAI client = OpenAI() langfuse = get_client()
@observe(name="agent.run") def run_agent(user_input: str) -> str: messages = [{"role": "user", "content": user_input}] for step in range(12):
resp = client.chat.completions.create( model="gpt-4o", messages=messages, tools=TOOL_SCHEMAS ) msg = resp.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content for call in msg.tool_calls:
with langfuse.start_as_current_observation( name="tool.call", as_type="span", input={"tool": call.function.name, "args": call.function.arguments}, ) as span: result = dispatch(call.function.name, json.loads(call.function.arguments)) span.update(output=result) messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)}) ```
Two things to notice. First, @observe decorates the outer function only; openinference auto-instruments the OpenAI client and emits LLM spans as children. Second, langfuse.start_as_current_observation is the API for custom spans. The as_type="span" is the difference between a generic observation and a true span that shows up in the waterfall UI. Use as_type="generation" only for actual model calls.
```python run_agent("cancel my order #4471") langfuse.score_current_span(name="success", value=True)
`
Scores are how you stop reading traces by hand. Set a success score on every terminal span and filter the trace list for score=success=false to find the failures in seconds.
The decorator emits a span per call, but the OpenTelemetry context propagates by thread, not by asyncio task. If you run your agent in asyncio.gather or with anyio.create_task_group, every concurrent call collapses into the first parent's trace. Fix: import langfuse.use_otel_context() and pass the current context to each task, or set LANGFUSE_TRACING_ENABLED=true plus call langfuse_context.update_current_observation(metadata={"trace_id": ...}) inside the task. This bug costs a day to find because the UI looks fine — it just has the wrong parents.
Wire @observe around the three highest-traffic agents in your repo, add a tool.call span per tool dispatch, and a success score per terminal call. After 24 hours, the trace list filtered by score=success=false is the only debugging surface you need.
— Mr. Technology