← Back to Payloads
Tutorial2026-05-28

Structured Output Without the Hallucination Hangover: JSON Schema Modes Compared

Every major LLM now has a 'give me valid JSON' mode. They're not created equal. A practical breakdown of how Claude, GPT, and Gemini handle structured output — with real code and the gotchas nobody puts in the docs.
Quick Access
Install command
$ mrt install llm
Browse related skills
Structured Output Without the Hallucination Hangover: JSON Schema Modes Compared

Structured Output Without the Hallucination Hangover: JSON Schema Modes Compared

Every major LLM provider now has a dedicated mode for getting structured, schema-valid JSON out of a model. They all sound the same in marketing copy. They are absolutely not the same in practice. Here's what actually works, what breaks in production, and how to pick the right one for your use case.

The Problem With 'Just Prompt It'

Asking an LLM for JSON in a plain prompt is a gamble. The model can hallucinate field names, miss required fields, and occasionally just... output a code block with a short story inside it. For prototypes this is fine. For anything going near a schema validator, a payment processor, or a UI component that expects specific fields, it's a liability.

The providers know this. They've each shipped a structured output mode. The implementations differ meaningfully.

Claude (Anthropic) — extra_body with input_schema

Claude doesn't have a dedicated "JSON mode" toggle. Instead, you use the extra_body parameter with an input_schema field. This is JSON Schema (draft 7), and Claude handles the constraint decoding under the hood.

```python from anthropic import Anthropic client = Anthropic()

response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Extract the user profile from this text..."}], extra_body={ "input_schema": { "type": "object", "properties": { "name": {"type": "string", "description": "Full name"}, "email": {"type": "string", "format": "email"}, "role": {"type": "string", "enum": ["admin", "member", "viewer"]}, "metadata": {"type": "object", "additionalProperties": True} }, "required": ["name", "email", "role"] } } )

response.content is already parsed as your schema

profile = response.content ```

The upside: The output is reliably structured. Anthropic uses constrained decoding — the model literally cannot output tokens that violate the schema. No trailing commas, no missing required fields, no stray markdown.

The gotcha: input_schema only accepts JSON Schema (draft 7). You can't pass $defs or references. If your schema is complex with reuse, you need to flatten it. Also, you can't combine input_schema with tools — it's one or the other. For tool-use cases, you're back to prompt engineering.

GPT (OpenAI) — response_format with json_schema

OpenAI's approach is explicit and well-named. You pass a response_format object with type: "json_schema" and a json_schema definition.

```python from openai import OpenAI client = OpenAI()

response = client.responses.create( model="gpt-4o-2026-05", input="Extract the order details from this text...", response_format={ "type": "json_schema", "json_schema": { "name": "order_details", "schema": { "type": "object", "properties": { "order_id": {"type": "string"}, "items": { "type": "array", "items": {"type": "string"} }, "total": {"type": "number", "minimum": 0}, "currency": {"type": "string", "enum": ["USD", "EUR", "GBP"]} }, "required": ["order_id", "items", "total", "currency"], "additionalProperties": False }, "strict": True } }, max_output_tokens=1024 )

order = json.loads(response.output_text) ```

The upside: The strict: true flag enforces the schema with constraint decoding — same guarantee as Claude. OpenAI's schema support is more complete: you can use additionalProperties: false, minLength, maxItems, and the full set of JSON Schema keywords.

The gotcha: The model still outputs a JSON string, not a parsed object — you need to json.loads() it. More importantly, the json_schema mode and tools mode are mutually exclusive. And for the o-series models, structured output is more expensive (it uses more tokens to guarantee validity).

Gemini (Google) — response_schema in the API

Gemini takes a different approach. Instead of a schema object, you pass response_schema and response_mime_type as top-level parameters. It also supports YAML alongside JSON.

```python from google import genai client = genai.Client()

response = client.models.generate_content( model="gemini-2.5-flash", contents="Extract the invoice data from this text...", config={ "response_mime_type": "application/json", "response_schema": { "type": "object", "properties": { "invoice_number": {"type": "string"}, "vendor": {"type": "string"}, "amount": {"type": "number"}, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "quantity": {"type": "integer"}, "unit_price": {"type": "number"} } } } }, "required": ["invoice_number", "vendor", "amount"] } } )

invoice = response.parsed ```

The upside: response.parsed is already a native object — no manual json.loads(). Gemini's schema support is solid, and the YAML option is genuinely useful for configs and CLIs where JSON is clunky. Gemini 2.5 Flash in particular is fast and cheap.

The gotcha: Gemini's schema enforcement is good but not as strict as the others for complex nested schemas — you may still get unexpected additional fields in edge cases. Also, the response_schema parameter is not available on all model versions; check you're on a version that supports it.

The Practical Decision Framework

Here's how to pick in practice:

ProviderBest ForWatch Out For
ClaudeReliable constraint decoding, clean schema enforcementNo tool combination, limited JSON Schema features
GPTFull JSON Schema keyword support, strict enforcementOutput is a string (needs parsing), more expensive on o-series
GeminiSpeed, cost, response.parsed convenience, YAML optionSlightly looser schema enforcement on edge cases

One Pattern That Works Everywhere

If you need schema-validated output from any provider, and you're willing to accept one extra step, this pattern is the most reliable:

python def extract_with_fallback(text: str, schema: dict, providers=["claude", "openai", "gemini"]): for provider in providers: try: result = provider_map[provider](text, schema) # call the appropriate API validated = jsonschema.validate(result, schema) # explicit validation return result except (jsonschema.ValidationError, JSONDecodeError, ProviderError) as e: continue raise ValueError("All providers failed schema validation")

Schema validation as a fallback is not ideal — you want constraint decoding doing the work. But when you're in a multi-provider setup and one model has a bad day, this gives you a clean failure mode instead of corrupted data sliding through.

The Actual Recommendation

For new projects: start with Gemini 2.5 Flash for cost and speed, and use response_schema + response.parsed. If you're in an Anthropic-heavy stack: use Claude with input_schema and accept the additionalProperties: true limitation by designing your schemas flat. If you need the full JSON Schema keyword arsenal: go **GPT-4o with json_schema and strict: true**.

None of them are perfect. All of them are significantly better than prompt-based JSON extraction. The hallucination hangover? Largely gone — but know your provider's edge cases before you ship.


*Claude input_schema via extra_body, OpenAI response_format.json_schema with strict, Gemini response_schema + response.parsed. Constraint decoding beats prompt engineering for structured output. Pick based on your schema complexity, provider lock-in tolerance, and cost sensitivity.*

Related Dispatches