
LLMs hallucinate JSON. Not sometimes — structurally. They add fields that weren't in your schema, they omit required ones, they return arrays when you asked for objects. Ask a model to return a JSON object and it will return something that looks like JSON approximately 60-70% of the time without careful engineering.
The industry knows this. The industry has also converged, somewhat silently, on the right solution. Let me show you the pattern.
Here's what happens when you just ask nicely:
```python response = openai.ChatCompletion.create( model="gpt-4", messages=[{ "role": "user", "content": "Return a JSON object with name, age, and email for a user." }] )
`
The model doesn't know your schema. It doesn't know your types. It returns what it thinks JSON looks like, and it's often wrong in subtle ways that pass visual inspection but fail at runtime.
The correct approach is to pass your schema to the model as a constraint via the schema parameter in the API call (available in modern API versions). This isn't documentation — it's enforcement.
python response = openai.ChatCompletion.create( model="gpt-4o", messages=[{ "role": "user", "content": "Extract user data from the following text." }], response_format={ "type": "json_object", "schema": { "type": "object", "required": ["name", "email", "age"], "properties": { "name": {"type": "string"}, "email": {"type": "string", "format": "email"}, "age": {"type": "integer", "minimum": 0} } } } )
This is the first layer. The schema tells the model what fields are required, what types are expected, and what formats are valid. The model is now constrained, not just informed.
The schema constraint in the API call is necessary but not sufficient. The model can still hallucinate within the schema — it can return a string that claims to be an email but isn't. You need runtime validation.
Pydantic is the standard solution here, and it's worth using correctly:
```python from pydantic import BaseModel, EmailStr, field_validator
class User(BaseModel): name: str email: EmailStr # Pydantic validates actual email format age: int
@field_validator('age') @classmethod def age_must_be_positive(cls, v): if v <= 0: raise ValueError('Age must be positive') return v
user = User.model_validate_json(response.choices[0].message.content) ```
Now your code fails fast if the model returns malformed data. No silent type coercion, no accepting invalid emails, no missing fields.
The two layers above catch most errors. But some still slip through — especially on edge cases, unusual inputs, or when the model is uncertain. The pattern that handles this is a retry loop with specific error feedback:
```python def extract_user(text: str, max_retries: int = 3) -> User: for attempt in range(max_retries): response = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": f"Extract: {text}"}], response_format={"type": "json_object", "schema": User.model_json_schema()} ) try: return User.model_validate_json(response.choices[0].message.content) except ValidationError as e: error_msg = f"Validation failed: {e}. Fix these errors and retry."
messages.append({"role": "assistant", "content": response.choices[0].message.content}) messages.append({"role": "user", "content": error_msg}) raise Exception(f"Failed after {max_retries} attempts") ```
The key detail: you're not just retrying blindly. You're passing the validation error back to the model so it can correct on the next attempt. This is dramatically more effective than retrying with the same prompt.
JSON mode (the json_object format parameter without a schema) is better than nothing but it's not enough. It tells the model "output valid JSON" but it doesn't tell it what valid means for your use case. You still get type errors, missing fields, unexpected structures.
The three-layer approach — schema constraint at the API level, Pydantic validation at the runtime level, and error-feedback retries as the fallback — covers every failure mode.
The schema constraint prevents structurally wrong outputs. The Pydantic validation catches anything that slips through. The retry loop handles edge cases that neither of the first two layers caught.
This pattern is used by every serious production system I've seen in the last twelve months. It shows up in instructor, in/outlines, in Guidance — but the core insight isn't any of those libraries. It's just: treat JSON Schema as a constraint, validate at runtime, and retry with error feedback.
The libraries automate the wiring. You need to understand the pattern first.
If you're building anything that relies on structured LLM output — classification, extraction, structured generation — and you're not using this pattern, you're leaving correctness on the table.
Three-layer structured output pattern: JSON Schema constraint at API level, Pydantic validation at runtime, error-feedback retry loop for edge cases. Works with OpenAI, Anthropic, and open-source models that support schema-constrained generation.