Instructor is the de facto Python standard for structured LLM outputs: 3M pip installs a month, Pydantic-native, 15+ providers, and a retry loop that ends the silent-bad-data failure mode in production. The architecture, the code, and the place where it falls short.

Instructor: The 3M-Download Python Library That Deleted Half My LLM Glue Code

Every Python LLM codebase past 5,000 lines has the same tumor. A parse_response() helper. A validate_with_retry() decorator. A try: json.loads(...) wrapped in three layers of error handling. Instructor is the library that actually killed this tumor in 2026. OpenAI cited Jason Liu's library as the inspiration for their structured-outputs feature. Three million pip installs a month, 11k stars, 100+ contributors. The de facto standard — and the standard is correct.

Hey guys, Mr. Technology here.

What It Is
How It Works (With Code)
Where It Fits
The Take

What It Is

Instructor is the Python standard for getting structured, validated data out of an LLM. Built by Jason Liu (jxnl) at 567 Labs, it is a thin wrapper one layer above the OpenAI, Anthropic, Google, Mistral, Cohere, DeepSeek, Groq, Together, Ollama, llama-cpp-python, and vLLM SDKs. You give it a Pydantic BaseModel; it gives you back an instance of that model, populated by the LLM, with the schema enforced on the wire and a retry loop on the response.

The design choice that matters: it is built on Pydantic, not on a custom DSL. BAML ships its own templating language. DSPy ships its own optimizer. Instructor ships a one-line wrapper around the Pydantic model you were already going to write. That is why it won.

3M+ monthly pip downloads, 11k stars, 100+ contributors, MIT licensed, with ports to TypeScript, Go, Ruby, Elixir, and Rust. The only library in this space that stayed opinionated about one thing — use the type system you already have — and grew into a category.

How It Works (With Code)

The full integration across three providers looks like this. Same schema, three backends, zero per-provider code.

python

import instructor
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class Priority(str, Enum):
    LOW, MEDIUM, HIGH, CRITICAL = "low", "medium", "high", "critical"
class Ticket(BaseModel):
    title: str = Field(..., min_length=5, max_length=100)
    priority: Priority
    estimated_hours: Optional[float] = Field(None, gt=0, le=100)
class SupportCase(BaseModel):
    customer_name: str
    tickets: List[Ticket] = Field(..., min_items=1)
openai_client = instructor.from_provider("openai/gpt-4o")
claude_client = instructor.from_provider("anthropic/claude-3-5-sonnet")
ollama_client = instructor.from_provider("ollama/llama3")
case = openai_client.create(
    response_model=SupportCase,
    messages=[{"role": "user", "content":
        "Acme Corp: 2 high-pri tickets, 1 critical, 3.5h est"}],
    max_retries=3,
)
print(case.tickets[0].priority)  # Priority.HIGH, type-safe

Three things happen under the hood.

1. Schema enforcement on the wire. Instructor serializes the Pydantic model to a JSON schema and hands it to the provider as a tool or function definition. The provider is forced to emit matching JSON. With OpenAI, Anthropic, and Google the constrained-decoding path runs at the API tier — the model literally cannot emit a value that violates your schema. With Ollama or vLLM, Instructor falls back to grammar-constrained sampling (the Outlines trick) for the same guarantee.

2. Pydantic validation on the way out. The response is run through Model.model_validate() — every Pydantic check applies: Field(gt=0, le=100), validators, enums, recursive models, discriminated unions. If validation fails, the exception is re-fed to the model as a re-ask message with the actual error text.

3. Tenacity-backed retry loop. With max_retries=3, Instructor makes up to four calls. On each failed validation, the Pydantic ValidationError is included in the next user message — the model sees what it got wrong and tries again. First-call success is 95%+ for any well-typed schema, functionally 100% for everything except adversarial prompts.

Bonus: create_partial(SupportCase) returns a typed object you can render in a UI as the fields fill in, with the JSON always coercible into the current valid state of the model.

Where It Fits

Versus Outlines. Outlines builds the JSON schema into the model's tokenizer, constraining generation at the token level — the model physically cannot pick an invalid next token. Stronger in the constrained-decoding sense, but fewer providers and Python-only. Instructor can use Outlines as a backend. Not really competitors.

Versus BAML. BAML ships its own templating language, schema language, and CLI — more expressive in some ways (first-class union discriminators, image inputs). The cost is you are writing BAML files in a BAML project with a BAML compiler, and the rest of your Python stack does not know about your data model. Instructor wins for pure Python teams because the schema is your Python code.

Versus raw JSON mode. Raw response_format={"type":"json_object"} gets you parseable JSON. It does not get you valid JSON against your schema, retries, nested validation, type safety, or enum coercion. A try: json.loads(response.choices[0].message.content) block in your codebase in 2026 is a code smell.

The Take

Instructor is the structured-output library Python LLM engineering has been waiting for. Pydantic ergonomics, 15+ providers, retry-on-validation, and 3M monthly downloads make it the default. If you are starting a new Python LLM project and not using Instructor, you are writing glue code that already exists.

Where it falls short. The retry loop runs at the Python tier, not the wire tier — for closed APIs validation failures are essentially impossible on first call, but for Ollama and other self-hosted backends you can chew tokens on retry storms. Pydantic's strict model_validate_json is good for safety and bad for "the model was 90% right."

Who should use it. Anyone shipping a Python LLM application in production where the output is consumed by code. Extraction, classification, structured chat, agent tool calls, evaluation pipelines. One-shot notebook calls can skip it. Anything real, use it.

The verdict: it is the de facto standard, and the de facto standard is correct. The library I would bet on in 2026, the one I would teach a new engineer first, and the one I would defend against a hand-rolled alternative.

— Mr. Technology

Instructor: The 3M-Download Python Library That Deleted Half My LLM Glue Code

Instructor: The 3M-Download Python Library That Deleted Half My LLM Glue Code

Contents

What It Is

How It Works (With Code)

Where It Fits

The Take