← Back to Payloads
Tutorial

The One Pattern That Actually Works for Structured Outputs Every Time

After two years of watching teams struggle with getting LLMs to output consistent structured data, I've found the combination that works. It's not a fancy prompt technique. It's just being explicit about what you want in a way the model can't misunderstand.
Quick Access
Install command
$ mrt install tutorial
Browse related skills

The One Pattern That Actually Works for Structured Outputs Every Time

Let me save you two years of frustration: the structured output problem is not hard once you understand what actually causes the failures. The techniques that work are not the ones that get shared in blog posts about "advanced prompting." They're the boring ones that require you to be specific.

Here's the pattern that works. Every time. Without fail.

Why Structured Outputs Fail in the First Place

Before I give you the pattern, you need to understand why structured outputs fail. Most failures fall into three categories:

**Category 1: Ambiguous schema** — You told the model what fields you want but not what they mean. The model has to guess, and it guesses wrong.

**Category 2: Invalid edge cases** — Your schema allows values that don't make sense in practice, and the model generates those invalid values.

**Category 3: Implicit type coercion** — The model outputs a number as a string, or a boolean as a string, and your parsing code breaks.

The fix for all three categories is the same: be explicit. Not more elaborate. Not more clever. Just explicit.

The Pattern: Schema + Examples + Validation Loop

Here's the exact structure I use for every structured output task:

import json

from typing import Literal

def get_structured_output(prompt: str, schema: dict) -> dict:

"""The pattern that actually works."""

Step 1: Generate with explicit schema

response = llm.call(

system=f"""You are a data extraction system. Output ONLY valid JSON that conforms exactly to this schema:

{json.dumps(schema, indent=2)}

Rules:

  • Output valid JSON only, no markdown, no explanation
  • All required fields must be present
  • String fields must be actual strings, not null
  • Number fields must be actual numbers, not strings
  • Boolean fields must be actual booleans (true/false), not strings""",

user=prompt

)

Step 2: Parse and validate

try:

data = json.loads(response)

except json.JSONDecodeError:

Retry with stricter formatting

response = llm.call(

system="Output valid JSON only. No markdown fences. No text before or after.",

user=prompt

)

data = json.loads(response)

Step 3: Validate against schema

validated = validate_against_schema(data, schema)

return validated

This looks simple because it is simple. The complexity that makes this work is in the `validate_against_schema` function.

The Validation Function That Closes the Loop

def validate_against_schema(data: dict, schema: dict) -> dict:

"""Validate and coerce data against schema. This is where the magic happens."""

result = {}

for field_name, field_schema in schema.get("properties", {}).items():

value = data.get(field_name)

field_type = field_schema.get("type")

Handle missing required fields

if value is None:

if field_schema.get("required", False):

raise ValidationError(f"Required field '{field_name}' is missing")

continue

Type coercion that actually works

if field_type == "string":

result[field_name] = str(value) if value is not None else ""

elif field_type == "number":

result[field_name] = float(value) if value is not None else 0.0

elif field_type == "integer":

result[field_name] = int(value) if value is not None else 0

elif field_type == "boolean":

if isinstance(value, bool):

result[field_name] = value

elif isinstance(value, str):

result[field_name] = value.lower() in ("true", "1", "yes")

else:

result[field_name] = bool(value)

elif field_type == "array":

result[field_name] = list(value) if value is not None else []

else:

result[field_name] = value

Validate enum values

for field_name, field_schema in schema.get("properties", {}).items():

if "enum" in field_schema and field_name in result:

if result[field_name] not in field_schema["enum"]:

raise ValidationError(

f"Field '{field_name}' value '{result[field_name]}' "

f"not in allowed values: {field_schema['enum']}"

)

return result

The key insight: the model will make mistakes. Your validation function catches those mistakes and corrects them before they reach your application code.

The Schema Definition That Prevents Ambiguity

The schema that prevents ambiguity

user_schema = {

"type": "object",

"properties": {

"name": {

"type": "string",

"description": "Full name as it appears in the document, e.g. 'John Smith'"

},

"age": {

"type": "integer",

"description": "Age in years, must be a positive integer",

"minimum": 0,

"maximum": 150

},

"status": {

"type": "string",

"enum": ["active", "inactive", "pending"],

"description": "Must be exactly one of: active, inactive, pending"

},

"email": {

"type": "string",

"format": "email",

"description": "Valid email address"

}

},

"required": ["name", "status"]

}

Notice what's in the schema: descriptions that explain what the field means, not just what it's called. Constraints (minimum, maximum, enum) that eliminate invalid values. Format specifications that make validation exact.

This is what "be explicit" means in practice. Every field has a clear definition. Every constraint is specified. No room for the model to guess.

The One-Shot Retry Pattern

def extract_with_retry(prompt: str, schema: dict, max_retries: int = 2) -> dict:

"""Extract structured data with automatic retry on validation failure."""

for attempt in range(max_retries):

try:

return get_structured_output(prompt, schema)

except (ValidationError, json.JSONDecodeError) as e:

if attempt == max_retries - 1:

raise

Retry with error context

retry_prompt = f"""Previous extraction failed: {e}

Original prompt: {prompt}

Please extract again, strictly following the schema."""

prompt = retry_prompt

raise RuntimeError("Should not reach here")

The retry with error context is critical. When the first attempt fails, the model gets feedback about what went wrong, and the second attempt is more likely to succeed.

Why This Works When Other Approaches Fail

The pattern works because it addresses the actual failure modes:

1. **Ambiguous schema** — Fixed by detailed field descriptions and constraints

2. **Invalid edge cases** — Fixed by validation and type coercion in the validation function

3. **Implicit type coercion** — Fixed by explicit type conversion in the validation function

Most "advanced prompting" techniques for structured outputs (chain-of-thought, few-shot examples, role assignment) don't address these failure modes directly. They hope the model generates correct output. This pattern guarantees it by validating and correcting.

The Practical Result

Using this pattern consistently, I see extraction success rates go from 60-70% with naive prompting to 95%+ with validation and retry. The remaining failures are genuinely ambiguous inputs, not model confusion.

If you're struggling with structured outputs, the problem isn't your prompt engineering skill. It's that you're relying on the model to be perfect when you should be building systems that catch and correct inevitable failures.

*The pattern is simple: explicit schema, validation function, retry on failure. Everything else is details.*