
If you have ever shipped an LLM feature, you have hit the same wall. The model returns JSON 95 percent of the time, then occasionally drops a quote, adds a trailing comma, or wraps the whole payload in a chatty preamble. Your downstream code crashes, your retry budget burns, and your on-call engineer wakes up at 3 AM. OpenAI structured outputs solves this exact problem by constraining the model’s tokens at decode time so the output matches your JSON schema with 100 percent guarantee.
This tutorial is for backend engineers, data extraction teams, and anyone wiring GPT-4o or GPT-5 into a production pipeline. By the end, you will know how to define schemas with Pydantic, parse responses safely, handle refusals, combine structured outputs with tool calling, and avoid the schema-design mistakes that silently degrade quality.
What Are OpenAI Structured Outputs?
OpenAI structured outputs are a feature in the Chat Completions API and Responses API that forces the model to emit JSON conforming exactly to a schema you provide. Unlike the older JSON mode, structured outputs use constrained decoding under the hood, so the model literally cannot produce a token that would break the schema. Therefore you skip retry loops, regex repair, and defensive parsing.
The feature is available on gpt-4o-2024-08-06 and later, all GPT-4.1 variants, GPT-5, and the o-series reasoning models. As a result, schema-strict responses are now the default expectation for any new integration.
How Structured Outputs Differ From JSON Mode
Both features ask for JSON, but only one of them guarantees you get valid JSON that matches your contract.
| Feature | JSON Mode | Structured Outputs |
|---|---|---|
| Valid JSON guarantee | Yes | Yes |
| Schema conformance | No | Yes (100%) |
| Required fields enforced | No | Yes |
| Enum values respected | Best effort | Guaranteed |
| Refusal handling | Manual | Built-in refusal field |
| Models supported | Most chat models | GPT-4o-2024-08-06+, GPT-4.1, GPT-5, o-series |
| Latency overhead | Negligible | Small first-call schema compile |
In practice, JSON mode means “the response will parse.” Structured outputs mean “the response will parse and match your TypeScript or Pydantic types.” Furthermore, the second guarantee is the one that actually unblocks production code.
Prerequisites
You will need:
- Python 3.10 or later
- An OpenAI API key with access to
gpt-4o-2024-08-06or newer - The official SDK installed via
pip install openai pydantic
If you are completely new to the OpenAI ecosystem, start with Building Apps With the OpenAI API for setup and authentication basics.
Setting Up the OpenAI SDK
First, install the modern SDK and set your key. Use environment variables in production; never hardcode credentials.
# requirements.txt
# openai>=1.50.0
# pydantic>=2.7.0
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
The SDK auto-loads OPENAI_API_KEY if it is set, so the explicit argument is optional. However, being explicit makes the code easier to test with mocked clients.
Defining Schemas With Pydantic
Pydantic is the recommended way to describe schemas because the SDK can convert a Pydantic model directly into a strict JSON schema. As a result, you keep one source of truth for both the prompt contract and the runtime validation.
from pydantic import BaseModel, Field
from typing import Literal
class Address(BaseModel):
street: str
city: str
postal_code: str = Field(description="Postal or ZIP code as a string")
country: Literal["US", "UK", "DE", "FR", "ES", "IT", "OTHER"]
class CustomerProfile(BaseModel):
full_name: str
email: str
phone: str | None
address: Address
risk_tier: Literal["low", "medium", "high"]
A few rules govern what the schema can contain. Specifically, every field must be required (use None types instead of optional fields if you need a “missing” representation), additionalProperties must be false, and the schema cannot use oneOf at the root. Consequently, the SDK enforces these constraints automatically when you pass a Pydantic model.
If you want a deeper look at Pydantic itself, the post on advanced Pydantic validation in FastAPI covers the validator patterns you will reuse here.
Using Structured Outputs With response_format
The cleanest entry point is client.beta.chat.completions.parse(). This helper accepts a Pydantic class as the response_format, validates the reply, and returns a typed object.
from openai import OpenAI
from pydantic import BaseModel
from typing import Literal
client = OpenAI()
class CustomerProfile(BaseModel):
full_name: str
email: str
phone: str | None
risk_tier: Literal["low", "medium", "high"]
def extract_profile(unstructured_text: str) -> CustomerProfile:
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "system",
"content": (
"You extract customer profiles from CRM notes. "
"Use null for missing fields. Never invent data."
),
},
{"role": "user", "content": unstructured_text},
],
response_format=CustomerProfile,
)
profile = completion.choices[0].message.parsed
if profile is None:
raise ValueError("Model refused to extract profile")
return profile
Why this matters: the call returns a real CustomerProfile instance. Therefore your IDE autocompletes fields, your type checker catches typos, and your tests can mock the client without losing type safety.
Handling Refusals and Edge Cases
Even with structured outputs, the model can refuse a request. For example, if you ask it to extract personal data from content that violates the safety policy, it will populate a refusal field instead of parsed. As a result, you must check both before treating the response as success.
from openai import OpenAI
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract structured data only."},
{"role": "user", "content": user_input},
],
response_format=CustomerProfile,
)
message = response.choices[0].message
if message.refusal:
# Log the refusal reason and surface a safe error to the user
log.warning("Model refused request: %s", message.refusal)
raise PermissionError(message.refusal)
if message.parsed is None:
# Rare, but the model returned null — usually a prompt design problem
raise ValueError("No structured data returned")
profile: CustomerProfile = message.parsed
In addition, watch for these failure modes:
- Length truncation. If the model hits
max_tokensmid-object, the response is invalid andparsedwill beNone. Set generous limits for extraction tasks. - Schema too complex. Schemas with deeply nested objects or many union types compile slowly on the first request. Subsequently, OpenAI caches the compiled schema for around an hour, so warm clients are fast.
- Unsupported types.
datetimeobjects must be expressed as ISO strings; arbitrarydict[str, Any]is not allowed.
Working With Function Calling and Strict Mode
Structured outputs also apply to tool calls. When you set strict: true on a function definition, the arguments the model returns are guaranteed to match the function’s schema. As a result, you can stop writing argument validators in your tool handlers.
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "create_calendar_event",
"description": "Create a new calendar event",
"strict": True,
"parameters": {
"type": "object",
"additionalProperties": False,
"required": ["title", "start_iso", "end_iso", "attendees"],
"properties": {
"title": {"type": "string"},
"start_iso": {
"type": "string",
"description": "ISO 8601 start timestamp with timezone",
},
"end_iso": {
"type": "string",
"description": "ISO 8601 end timestamp with timezone",
},
"attendees": {
"type": "array",
"items": {"type": "string"},
"description": "List of attendee emails",
},
},
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You schedule meetings."},
{"role": "user", "content": "Set up a 30 min sync with anna@x.com tomorrow at 10am PT"},
],
tools=tools,
)
tool_call = response.choices[0].message.tool_calls[0]
# arguments is guaranteed to be valid JSON matching the schema above
args = tool_call.function.arguments
For broader patterns on agent-style orchestration with tools, see building AI agents with tools, planning, and execution.
Real-World Scenario: Extracting Invoice Data
Consider a fintech team building an invoice processing pipeline. The input is OCR text from PDFs of varying quality, and the downstream system writes line items into Postgres. Without structured outputs, the team typically maintains two separate things: a prompt that asks for JSON and a Pydantic validator that re-checks every field. As a result, they discover at runtime that the model dropped a required tax_amount once every few hundred documents.
After switching to structured outputs, the team collapses both concerns into a single Invoice Pydantic model. Subsequently, the prompt shrinks to a one-line instruction, the validator goes away, and the only failure path becomes “model refused” or “OCR was too noisy” — both of which are real signals worth logging. In a mid-sized pipeline running tens of thousands of invoices per week, this typically removes an entire class of midnight pages while making the code shorter.
from datetime import date
from decimal import Decimal
from pydantic import BaseModel, Field
from typing import Literal
class LineItem(BaseModel):
description: str
quantity: int
unit_price_cents: int = Field(description="Unit price in cents, integer")
total_cents: int
class Invoice(BaseModel):
vendor_name: str
invoice_number: str
issue_date: str = Field(description="ISO date string YYYY-MM-DD")
due_date: str = Field(description="ISO date string YYYY-MM-DD")
currency: Literal["USD", "EUR", "GBP", "CAD"]
line_items: list[LineItem]
subtotal_cents: int
tax_cents: int
total_cents: int
def extract_invoice(ocr_text: str) -> Invoice:
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "system",
"content": (
"Extract invoice fields from OCR text. "
"Convert all monetary values to integer cents. "
"If a field is unreadable, refuse the request."
),
},
{"role": "user", "content": ocr_text},
],
response_format=Invoice,
max_tokens=1500,
)
msg = completion.choices[0].message
if msg.refusal:
raise ValueError(f"OCR too noisy: {msg.refusal}")
return msg.parsed
Notice that monetary values are integers in cents, not floats. Specifically, this is a deliberate schema choice — float arithmetic on currency causes rounding bugs in production accounting code, and constraining the field to int at the schema level prevents the model from emitting 19.99 when you expected 1999.
Streaming Structured Outputs
For interactive UIs, you often want to stream the response so the user sees progress. The SDK supports streaming with the same parsing helpers, and partial parses become available as fields fill in.
with client.beta.chat.completions.stream(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract a customer profile."},
{"role": "user", "content": notes},
],
response_format=CustomerProfile,
) as stream:
for event in stream:
if event.type == "content.delta":
# Partial parsed object so far
print(event.parsed)
elif event.type == "content.done":
final: CustomerProfile = event.parsed
final_completion = stream.get_final_completion()
For a deeper dive on streaming UX patterns and chunking strategies, see AI chatbot streaming responses.
When to Use OpenAI Structured Outputs
Reach for structured outputs when:
- The downstream consumer is code, not a human (databases, queues, APIs)
- Your schema has more than three or four fields and clear types
- You currently retry on JSON parse errors or run regex cleanup
- You need enums to be respected exactly (status fields, country codes, risk tiers)
- The task is extraction, classification, or routing rather than open-ended generation
When NOT to Use OpenAI Structured Outputs
There are real cases where free-form text is the right tool. Skip structured outputs when:
- You want a long-form chat reply, summary, or rewrite
- The output is markdown intended for direct rendering to users
- The schema would have to be regenerated on every request because it depends on user input
- You target older models like GPT-3.5 Turbo where the feature is not available
- You need partial JSON tolerance — for example, asking the model to “fill in what you can” with optional fields scattered throughout
In these cases, plain prompting plus a permissive parser is simpler and cheaper.
Common Mistakes With OpenAI Structured Outputs
A few patterns trip up almost every team adopting the feature.
Marking fields optional instead of nullable. The schema requires every field to be present. Therefore, if a field can be missing, type it as str | None and instruct the model to use null, not as an optional field via Optional[...] or by omitting it from required.
Using Any or untyped dicts. Structured outputs need a closed schema. Consequently, fields like metadata: dict[str, Any] will be rejected. Instead, define the metadata as a nested model or, if it really is dynamic, accept it as a JSON string and parse it yourself.
Confusing strict with “the model will obey instructions.” Strict mode constrains the JSON shape, not the content. For instance, the model can still produce a syntactically valid email that is not a real email address, or fill quantity: 0 when you expected at least one item. Add semantic validation in your Pydantic model with @field_validator for that.
Schema bloat. Every union, every deeply nested object, and every enum costs you in latency and increases the chance the model refuses or truncates. As a rule of thumb, keep the schema under 30 fields total and split larger extractions into multiple calls.
Ignoring refusals. A null parsed with a populated refusal is the model’s way of saying “I will not do this.” Treat it like a 403, not a transient error to retry on.
Forgetting model version pinning. Structured outputs require gpt-4o-2024-08-06 or later. As a result, calls that route to an older snapshot via an alias will silently fall back to JSON mode behavior. Pin model strings explicitly.
Comparing With Other Providers
OpenAI is not the only API offering this guarantee. Anthropic supports tool-call schemas with similar strict semantics — see getting started with Claude API for the equivalent pattern. Furthermore, open models served via vLLM or Ollama can use the outlines or lm-format-enforcer libraries to constrain decoding the same way. Provider portability is real, but the developer experience is not yet identical, so most production teams pick one provider per service.
If you are weighing Pydantic models against plain dataclasses for the schema layer, dataclasses vs Pydantic models walks through the trade-offs.
Conclusion
OpenAI structured outputs turn LLM responses into a typed contract you can rely on. By defining a Pydantic schema, calling parse, checking for refusals, and pinning your model version, you eliminate an entire category of production bugs and shrink the code you have to maintain. Start by converting one extraction endpoint that currently retries on JSON errors — then watch how much defensive parsing you can delete.
Next, pair structured outputs with strong prompts. Read prompt engineering best practices to tighten the instructions that drive your schemas, and building apps with the OpenAI API for the surrounding patterns around retries, rate limits, and observability.
1 Comment