LLMs hallucinate JSON. It's a fact of life. But there's a specific pattern — not a library, not a framework — that dramatically reduces malformed outputs. It's been hiding in plain sight and almost nobody is using it correctly.

The JSON Schema Pattern That Forces LLMs to Give You Valid Structured Output

LLMs hallucinate JSON. Not sometimes — structurally. They add fields that weren't in your schema, they omit required ones, they return arrays when you asked for objects. Ask a model to return a JSON object and it will return something that looks like JSON approximately 60-70% of the time without careful engineering.

The industry knows this. The industry has also converged, somewhat silently, on the right solution. Let me show you the pattern.

The Problem in One Example

Here's what happens when you just ask nicely:

python

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": "Return a JSON object with name, age, and email for a user."
    }]
)
# LLM returns: {"name": "Alice", "age": "thirty", "email": "alice@example.com"}
# age is a string, not an integer. Email missing quotes sometimes.
# This breaks your type validation immediately.

The model doesn't know your schema. It doesn't know your types. It returns what it thinks JSON looks like, and it's often wrong in subtle ways that pass visual inspection but fail at runtime.

The Fix: JSON Schema as a Constraint, Not a Description

The correct approach is to pass your schema to the model as a constraint via the schema parameter in the API call (available in modern API versions). This isn't documentation — it's enforcement.

python

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract user data from the following text."
    }],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "required": ["name", "email", "age"],
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string", "format": "email"},
                "age": {"type": "integer", "minimum": 0}
            }
        }
    }
)

This is the first layer. The schema tells the model what fields are required, what types are expected, and what formats are valid. The model is now constrained, not just informed.

Layer Two: Pydantic as Your Validation Contract

The schema constraint in the API call is necessary but not sufficient. The model can still hallucinate within the schema — it can return a string that claims to be an email but isn't. You need runtime validation.

Pydantic is the standard solution here, and it's worth using correctly:

python

from pydantic import BaseModel, EmailStr, field_validator
class User(BaseModel):
    name: str
    email: EmailStr  # Pydantic validates actual email format
    age: int
    @field_validator('age')
    @classmethod
    def age_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Age must be positive')
        return v
# Parse with validation
user = User.model_validate_json(response.choices[0].message.content)

Now your code fails fast if the model returns malformed data. No silent type coercion, no accepting invalid emails, no missing fields.

Layer Three: The Retry Loop with Structured Error Messages

The two layers above catch most errors. But some still slip through — especially on edge cases, unusual inputs, or when the model is uncertain. The pattern that handles this is a retry loop with specific error feedback:

python

def extract_user(text: str, max_retries: int = 3) -> User:
    for attempt in range(max_retries):
        response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Extract: {text}"}],
            response_format={"type": "json_object", "schema": User.model_json_schema()}
        )
        try:
            return User.model_validate_json(response.choices[0].message.content)
        except ValidationError as e:
            error_msg = f"Validation failed: {e}. Fix these errors and retry."
            # Pass error back to model for correction on next attempt
            messages.append({"role": "assistant", "content": response.choices[0].message.content})
            messages.append({"role": "user", "content": error_msg})
    raise Exception(f"Failed after {max_retries} attempts")

The key detail: you're not just retrying blindly. You're passing the validation error back to the model so it can correct on the next attempt. This is dramatically more effective than retrying with the same prompt.

Why This Outperforms JSON Mode Alone

JSON mode (the json_object format parameter without a schema) is better than nothing but it's not enough. It tells the model "output valid JSON" but it doesn't tell it what valid means for your use case. You still get type errors, missing fields, unexpected structures.

The three-layer approach — schema constraint at the API level, Pydantic validation at the runtime level, and error-feedback retries as the fallback — covers every failure mode.

The schema constraint prevents structurally wrong outputs. The Pydantic validation catches anything that slips through. The retry loop handles edge cases that neither of the first two layers caught.

The Pattern in Practice

This pattern is used by every serious production system I've seen in the last twelve months. It shows up in instructor, in/outlines, in Guidance — but the core insight isn't any of those libraries. It's just: treat JSON Schema as a constraint, validate at runtime, and retry with error feedback.

The libraries automate the wiring. You need to understand the pattern first.

If you're building anything that relies on structured LLM output — classification, extraction, structured generation — and you're not using this pattern, you're leaving correctness on the table.

Three-layer structured output pattern: JSON Schema constraint at API level, Pydantic validation at runtime, error-feedback retry loop for edge cases. Works with OpenAI, Anthropic, and open-source models that support schema-constrained generation.