OpenAI Structured Outputs: Strict JSON for AI Apps in 2026

If you have ever shipped an LLM feature, you have hit the same wall. The model returns JSON 95 percent of the time, then occasionally drops a quote, adds a trailing comma, or wraps the whole payload in a chatty preamble. Your downstream code crashes, your retry budget burns, and your on-call engineer wakes up at 3 AM. OpenAI structured outputs solves this exact problem by constraining the model’s tokens at decode time so the output matches your JSON schema with 100 percent guarantee.

This tutorial is for backend engineers, data extraction teams, and anyone wiring GPT-4o or GPT-5 into a production pipeline. By the end, you will know how to define schemas with Pydantic, parse responses safely, handle refusals, combine structured outputs with tool calling, and avoid the schema-design mistakes that silently degrade quality.

What Are OpenAI Structured Outputs?

OpenAI structured outputs are a feature in the Chat Completions API and Responses API that forces the model to emit JSON conforming exactly to a schema you provide. Unlike the older JSON mode, structured outputs use constrained decoding under the hood, so the model literally cannot produce a token that would break the schema. Therefore you skip retry loops, regex repair, and defensive parsing.

The feature is available on gpt-4o-2024-08-06 and later, all GPT-4.1 variants, GPT-5, and the o-series reasoning models. As a result, schema-strict responses are now the default expectation for any new integration.

How Structured Outputs Differ From JSON Mode

Both features ask for JSON, but only one of them guarantees you get valid JSON that matches your contract.

Feature	JSON Mode	Structured Outputs
Valid JSON guarantee	Yes	Yes
Schema conformance	No	Yes (100%)
Required fields enforced	No	Yes
Enum values respected	Best effort	Guaranteed
Refusal handling	Manual	Built-in `refusal` field
Models supported	Most chat models	GPT-4o-2024-08-06+, GPT-4.1, GPT-5, o-series
Latency overhead	Negligible	Small first-call schema compile

In practice, JSON mode means “the response will parse.” Structured outputs mean “the response will parse and match your TypeScript or Pydantic types.” Furthermore, the second guarantee is the one that actually unblocks production code.

Prerequisites

You will need:

Python 3.10 or later
An OpenAI API key with access to gpt-4o-2024-08-06 or newer
The official SDK installed via pip install openai pydantic

If you are completely new to the OpenAI ecosystem, start with Building Apps With the OpenAI API for setup and authentication basics.

Setting Up the OpenAI SDK

First, install the modern SDK and set your key. Use environment variables in production; never hardcode credentials.

# requirements.txt
# openai>=1.50.0
# pydantic>=2.7.0

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

The SDK auto-loads OPENAI_API_KEY if it is set, so the explicit argument is optional. However, being explicit makes the code easier to test with mocked clients.

Defining Schemas With Pydantic

Pydantic is the recommended way to describe schemas because the SDK can convert a Pydantic model directly into a strict JSON schema. As a result, you keep one source of truth for both the prompt contract and the runtime validation.

from pydantic import BaseModel, Field
from typing import Literal

class Address(BaseModel):
    street: str
    city: str
    postal_code: str = Field(description="Postal or ZIP code as a string")
    country: Literal["US", "UK", "DE", "FR", "ES", "IT", "OTHER"]

class CustomerProfile(BaseModel):
    full_name: str
    email: str
    phone: str | None
    address: Address
    risk_tier: Literal["low", "medium", "high"]

A few rules govern what the schema can contain. Specifically, every field must be required (use None types instead of optional fields if you need a “missing” representation), additionalProperties must be false, and the schema cannot use oneOf at the root. Consequently, the SDK enforces these constraints automatically when you pass a Pydantic model.

If you want a deeper look at Pydantic itself, the post on advanced Pydantic validation in FastAPI covers the validator patterns you will reuse here.

Using Structured Outputs With response_format

The cleanest entry point is client.beta.chat.completions.parse(). This helper accepts a Pydantic class as the response_format, validates the reply, and returns a typed object.

from openai import OpenAI
from pydantic import BaseModel
from typing import Literal

client = OpenAI()

class CustomerProfile(BaseModel):
    full_name: str
    email: str
    phone: str | None
    risk_tier: Literal["low", "medium", "high"]

def extract_profile(unstructured_text: str) -> CustomerProfile:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": (
                    "You extract customer profiles from CRM notes. "
                    "Use null for missing fields. Never invent data."
                ),
            },
            {"role": "user", "content": unstructured_text},
        ],
        response_format=CustomerProfile,
    )

    profile = completion.choices[0].message.parsed
    if profile is None:
        raise ValueError("Model refused to extract profile")
    return profile

Why this matters: the call returns a real CustomerProfile instance. Therefore your IDE autocompletes fields, your type checker catches typos, and your tests can mock the client without losing type safety.

Handling Refusals and Edge Cases

Even with structured outputs, the model can refuse a request. For example, if you ask it to extract personal data from content that violates the safety policy, it will populate a refusal field instead of parsed. As a result, you must check both before treating the response as success.

from openai import OpenAI

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract structured data only."},
        {"role": "user", "content": user_input},
    ],
    response_format=CustomerProfile,
)

message = response.choices[0].message

if message.refusal:
    # Log the refusal reason and surface a safe error to the user
    log.warning("Model refused request: %s", message.refusal)
    raise PermissionError(message.refusal)

if message.parsed is None:
    # Rare, but the model returned null — usually a prompt design problem
    raise ValueError("No structured data returned")

profile: CustomerProfile = message.parsed

In addition, watch for these failure modes:

Length truncation. If the model hits max_tokens mid-object, the response is invalid and parsed will be None. Set generous limits for extraction tasks.
Schema too complex. Schemas with deeply nested objects or many union types compile slowly on the first request. Subsequently, OpenAI caches the compiled schema for around an hour, so warm clients are fast.
Unsupported types. datetime objects must be expressed as ISO strings; arbitrary dict[str, Any] is not allowed.

Working With Function Calling and Strict Mode

Structured outputs also apply to tool calls. When you set strict: true on a function definition, the arguments the model returns are guaranteed to match the function’s schema. As a result, you can stop writing argument validators in your tool handlers.

from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "create_calendar_event",
            "description": "Create a new calendar event",
            "strict": True,
            "parameters": {
                "type": "object",
                "additionalProperties": False,
                "required": ["title", "start_iso", "end_iso", "attendees"],
                "properties": {
                    "title": {"type": "string"},
                    "start_iso": {
                        "type": "string",
                        "description": "ISO 8601 start timestamp with timezone",
                    },
                    "end_iso": {
                        "type": "string",
                        "description": "ISO 8601 end timestamp with timezone",
                    },
                    "attendees": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "List of attendee emails",
                    },
                },
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You schedule meetings."},
        {"role": "user", "content": "Set up a 30 min sync with anna@x.com tomorrow at 10am PT"},
    ],
    tools=tools,
)

tool_call = response.choices[0].message.tool_calls[0]
# arguments is guaranteed to be valid JSON matching the schema above
args = tool_call.function.arguments

For broader patterns on agent-style orchestration with tools, see building AI agents with tools, planning, and execution.

Real-World Scenario: Extracting Invoice Data

Consider a fintech team building an invoice processing pipeline. The input is OCR text from PDFs of varying quality, and the downstream system writes line items into Postgres. Without structured outputs, the team typically maintains two separate things: a prompt that asks for JSON and a Pydantic validator that re-checks every field. As a result, they discover at runtime that the model dropped a required tax_amount once every few hundred documents.

After switching to structured outputs, the team collapses both concerns into a single Invoice Pydantic model. Subsequently, the prompt shrinks to a one-line instruction, the validator goes away, and the only failure path becomes “model refused” or “OCR was too noisy” — both of which are real signals worth logging. In a mid-sized pipeline running tens of thousands of invoices per week, this typically removes an entire class of midnight pages while making the code shorter.

from datetime import date
from decimal import Decimal
from pydantic import BaseModel, Field
from typing import Literal

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price_cents: int = Field(description="Unit price in cents, integer")
    total_cents: int

class Invoice(BaseModel):
    vendor_name: str
    invoice_number: str
    issue_date: str = Field(description="ISO date string YYYY-MM-DD")
    due_date: str = Field(description="ISO date string YYYY-MM-DD")
    currency: Literal["USD", "EUR", "GBP", "CAD"]
    line_items: list[LineItem]
    subtotal_cents: int
    tax_cents: int
    total_cents: int

def extract_invoice(ocr_text: str) -> Invoice:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": (
                    "Extract invoice fields from OCR text. "
                    "Convert all monetary values to integer cents. "
                    "If a field is unreadable, refuse the request."
                ),
            },
            {"role": "user", "content": ocr_text},
        ],
        response_format=Invoice,
        max_tokens=1500,
    )

    msg = completion.choices[0].message
    if msg.refusal:
        raise ValueError(f"OCR too noisy: {msg.refusal}")
    return msg.parsed

Notice that monetary values are integers in cents, not floats. Specifically, this is a deliberate schema choice — float arithmetic on currency causes rounding bugs in production accounting code, and constraining the field to int at the schema level prevents the model from emitting 19.99 when you expected 1999.

Streaming Structured Outputs

For interactive UIs, you often want to stream the response so the user sees progress. The SDK supports streaming with the same parsing helpers, and partial parses become available as fields fill in.

with client.beta.chat.completions.stream(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract a customer profile."},
        {"role": "user", "content": notes},
    ],
    response_format=CustomerProfile,
) as stream:
    for event in stream:
        if event.type == "content.delta":
            # Partial parsed object so far
            print(event.parsed)
        elif event.type == "content.done":
            final: CustomerProfile = event.parsed

    final_completion = stream.get_final_completion()

For a deeper dive on streaming UX patterns and chunking strategies, see AI chatbot streaming responses.

When to Use OpenAI Structured Outputs

Reach for structured outputs when:

The downstream consumer is code, not a human (databases, queues, APIs)
Your schema has more than three or four fields and clear types
You currently retry on JSON parse errors or run regex cleanup
You need enums to be respected exactly (status fields, country codes, risk tiers)
The task is extraction, classification, or routing rather than open-ended generation

When NOT to Use OpenAI Structured Outputs

There are real cases where free-form text is the right tool. Skip structured outputs when:

You want a long-form chat reply, summary, or rewrite
The output is markdown intended for direct rendering to users
The schema would have to be regenerated on every request because it depends on user input
You target older models like GPT-3.5 Turbo where the feature is not available
You need partial JSON tolerance — for example, asking the model to “fill in what you can” with optional fields scattered throughout

In these cases, plain prompting plus a permissive parser is simpler and cheaper.

Common Mistakes With OpenAI Structured Outputs

A few patterns trip up almost every team adopting the feature.

Marking fields optional instead of nullable. The schema requires every field to be present. Therefore, if a field can be missing, type it as str | None and instruct the model to use null, not as an optional field via Optional[...] or by omitting it from required.

Using Any or untyped dicts. Structured outputs need a closed schema. Consequently, fields like metadata: dict[str, Any] will be rejected. Instead, define the metadata as a nested model or, if it really is dynamic, accept it as a JSON string and parse it yourself.

Confusing strict with “the model will obey instructions.” Strict mode constrains the JSON shape, not the content. For instance, the model can still produce a syntactically valid email that is not a real email address, or fill quantity: 0 when you expected at least one item. Add semantic validation in your Pydantic model with @field_validator for that.

Schema bloat. Every union, every deeply nested object, and every enum costs you in latency and increases the chance the model refuses or truncates. As a rule of thumb, keep the schema under 30 fields total and split larger extractions into multiple calls.

Ignoring refusals. A null parsed with a populated refusal is the model’s way of saying “I will not do this.” Treat it like a 403, not a transient error to retry on.

Forgetting model version pinning. Structured outputs require gpt-4o-2024-08-06 or later. As a result, calls that route to an older snapshot via an alias will silently fall back to JSON mode behavior. Pin model strings explicitly.

Comparing With Other Providers

OpenAI is not the only API offering this guarantee. Anthropic supports tool-call schemas with similar strict semantics — see getting started with Claude API for the equivalent pattern. Furthermore, open models served via vLLM or Ollama can use the outlines or lm-format-enforcer libraries to constrain decoding the same way. Provider portability is real, but the developer experience is not yet identical, so most production teams pick one provider per service.

If you are weighing Pydantic models against plain dataclasses for the schema layer, dataclasses vs Pydantic models walks through the trade-offs.

Conclusion

OpenAI structured outputs turn LLM responses into a typed contract you can rely on. By defining a Pydantic schema, calling parse, checking for refusals, and pinning your model version, you eliminate an entire category of production bugs and shrink the code you have to maintain. Start by converting one extraction endpoint that currently retries on JSON errors — then watch how much defensive parsing you can delete.

Next, pair structured outputs with strong prompts. Read prompt engineering best practices to tighten the instructions that drive your schemas, and building apps with the OpenAI API for the surrounding patterns around retries, rate limits, and observability.

OpenAI Structured Outputs: Strict JSON Every Time

What Are OpenAI Structured Outputs?

How Structured Outputs Differ From JSON Mode

Prerequisites

Setting Up the OpenAI SDK

Defining Schemas With Pydantic

Using Structured Outputs With response_format

Handling Refusals and Edge Cases

Working With Function Calling and Strict Mode

Real-World Scenario: Extracting Invoice Data

Streaming Structured Outputs

When to Use OpenAI Structured Outputs

When NOT to Use OpenAI Structured Outputs

Common Mistakes With OpenAI Structured Outputs

Comparing With Other Providers

Conclusion

4 Comments

Leave a Comment Cancel reply

What Are OpenAI Structured Outputs?

How Structured Outputs Differ From JSON Mode

Prerequisites

Setting Up the OpenAI SDK

Defining Schemas With Pydantic

Using Structured Outputs With response_format

Handling Refusals and Edge Cases

Working With Function Calling and Strict Mode

Real-World Scenario: Extracting Invoice Data

Streaming Structured Outputs

When to Use OpenAI Structured Outputs

When NOT to Use OpenAI Structured Outputs

Common Mistakes With OpenAI Structured Outputs

Comparing With Other Providers

Conclusion

4 Comments

Leave a Comment Cancel reply

Related Articles

Anthropic Prompt Caching: Cut Claude API Costs by 90%

Gemini API Multimodal: Vision and Video Processing Guide

Gemini API Function Calling: Practical Patterns That Work