
Here's the dirty secret of every agent framework built in the last two years: they all treat the model as a black box that occasionally hallucinates JSON, and the developer as someone who'll write a try/except and hope for the best. The result is agents that work in demos, fail silently in production, and need an entire observability layer just to figure out why they failed.
Pydantic AI takes a different position: the model is a function whose return type you can actually enforce. And once you internalize that, the shape of how you build agents changes.
The Pydantic team has spent seven years making Python's de facto validation library faster, more correct, and more ergonomic. Pydantic AI is what happens when you let those people build an agent framework. The bet is simple: if your tool inputs, tool outputs, agent outputs, dependencies, and dependencies between agents are all Pydantic models, the framework can validate every single message that flows through your system — including the ones the model generates.
That sounds small. It isn't.
```python from pydantic_ai import Agent from pydantic import BaseModel
class SupportTicket(BaseModel): category: Literal["billing", "technical", "account", "other"] priority: Literal["low", "medium", "high", "critical"] summary: str next_action: str
agent = Agent( "openai:gpt-4o", result_type=SupportTicket, system_prompt="Classify support emails into structured tickets.", )
result = agent.run_sync(user.email_body) print(result.data.priority) # guaranteed to be one of four literals ```
When the model returns malformed JSON, Pydantic AI doesn't return a partial result — it catches the validation error, sends it back to the model as feedback, and asks for a corrected response. The retry happens automatically. Your downstream code never sees garbage. This is the part that changes everything.
The other half of the bet is dependency injection. Pydantic AI uses the same RunContext[Deps] pattern that FastAPI uses for request-scoped state, and once you've shipped a real FastAPI service, the agent.run_sync(user_message, deps=db_session) ergonomics feel like home.
```python @dataclass class SupportDeps: db: Database current_user: User ticket_history: list[Ticket]
@agent.tool async def lookup_account(ctx: RunContext[SupportDeps], account_id: str) -> Account: return await ctx.deps.db.get_account(account_id, owner=ctx.deps.current_user.id) ```
The ctx.deps is type-checked. The Database class is type-checked. The Account return type is validated. Every boundary in your agent is now a typed contract, not a stringly-typed prayer.
The same pydantic runtime that validates FastAPI requests validates your model's outputs.
The pydantic-graph companion library is where Pydantic AI's opinions about state management really shine. Multi-agent systems in most frameworks degenerate into shared mutable state, string-keyed message passing, and a debugging experience that requires print() statements.
Pydantic Graph treats the agent workflow as a typed state machine. Each node declares its input type, its output type, and which node to call next. The framework handles serialization, checkpointing, and type-safe transitions between agents.
```python class ResearchState(BaseModel): query: str sources: list[Source] = [] draft: str | None = None
class ResearchNode(BaseNode[ResearchState, SupportDeps, ResearchState]): async def run(self, ctx) -> ResearchNode: results = await search(self.state.query) return WriteNode(state=self.state.model_copy(update={"sources": results})) ```
It isn't trying to be LangGraph. It isn't a general-purpose DAG engine. It's a Pydantic-flavored way to express agent workflows where the state transitions are validated the same way your API payloads are validated. It's the least magical multi-agent system I've used, and that's a compliment.
Pydantic AI ships first-class integration with Logfire — Pydantic's own observability platform built on OpenTelemetry. The same framework that validates your agent's outputs can also trace every model call, every tool invocation, every validation retry, and every token spent.
For teams that have been duct-taping Langfuse or Helicone onto existing agent setups, having observability baked into the validation layer means the metrics you care about (validation failures, retry rates, schema mismatches) are first-class events, not scraped logs.
Pydantic AI is opinionated in ways that won't fit every team. The dependency on Pydantic V2 means if you're on V1, you're paying a migration tax. The framework is also tightly coupled to the Pydantic ecosystem — Logfire, pydantic-graph, Pydantic AI itself — which is a feature if you're all-in and a constraint if you want to mix and match.
Model support is solid for OpenAI, Anthropic, and Gemini but thinner for some open-source endpoints. If you're running a 70B local model through vLLM and expect the same retry semantics as GPT-4o, you'll need to do some wiring.
Most agent frameworks are layered on top of string parsing, JSON Schema validation, and developer discipline. Pydantic AI inverts the stack — the validation is the framework, and the model is just another function whose return type you specify.
For teams shipping agents to production, that's the architecture that scales. The Pydantic V2 runtime is fast, the type system is comprehensive, and the developer experience is the closest thing to "just write Python" that any agent framework offers.
The real value isn't the type hints — it's the explicit contract between you and the model. Everything else is just hoping the LLM behaves.
Pydantic AI is open source at github.com/pydantic/pydantic-ai. Python type hints drive tool definitions, outputs, and dependencies. Pydantic V2 validation with automatic retry on schema failure. pydantic-graph for typed multi-agent workflows. Logfire integration for OpenTelemetry-native observability. MIT licensed, actively developed by the Pydantic team.