
If you have ever shipped an LLM-powered feature in Python, you have probably hit the same wall: the model returns a JSON blob, you parse it, something is missing or the wrong type, and your app crashes in production. Pydantic AI is a Python agent framework built by the team behind Pydantic that treats this problem as a first-class concern. It uses Pydantic models to enforce LLM outputs, validates tool calls before they hit your code, and stays close to the FastAPI-style developer experience most Python backend engineers already know.
This guide walks intermediate Python developers through building production-grade agents with Pydantic AI. By the end, you will know how to set it up, define typed agents with structured outputs and tools, handle streaming and validation errors, and decide whether Pydantic AI is the right pick for your project.
What Is Pydantic AI?
Pydantic AI is an open-source Python framework for building LLM agents that enforces strict type safety on inputs, outputs, and tool calls using Pydantic v2 models. Released by the Pydantic team in late 2024, it wraps providers like OpenAI, Anthropic, Gemini, Groq, and Ollama behind a single typed interface, so you write your business logic against Pydantic schemas instead of raw JSON. The result is fewer runtime surprises and an agent layer that integrates cleanly with FastAPI, SQLAlchemy, and standard observability tools.
The framework deliberately avoids the heavy abstractions of larger orchestration libraries. Instead, it focuses on three concrete guarantees: outputs match a declared Pydantic model, tools are validated before execution, and dependency injection makes testing painless.
Why Pydantic AI Over LangChain or Plain SDKs?
Most Python AI stacks fall into two camps. On one side, you have raw provider SDKs where you parse JSON by hand and pray. On the other side, you have heavy frameworks like LangChain that bring orchestration but also bring chains, callbacks, and abstractions that often outlast their usefulness. Pydantic AI sits between them.
Specifically, Pydantic AI gives you:
- Compile-time-ish safety through Pydantic models — your IDE and
mypyknow what an agent returns - Provider-agnostic code that works across OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama, and others
- Native async support that plays well with FastAPI, anyio, and standard Python async patterns
- Built-in observability via Logfire (the Pydantic team’s tracing tool) with OpenTelemetry compatibility
- Streaming with type validation so partial responses still respect your schema
If you already use Pydantic for request validation in advanced Pydantic validation in FastAPI, the mental model carries over directly.
Prerequisites
Before installing Pydantic AI, make sure you have:
- Python 3.10 or newer (Pydantic AI uses modern type syntax)
- An API key for at least one model provider (OpenAI, Anthropic, or similar)
- Working knowledge of Pydantic v2 — if you need a refresher, see dataclasses vs Pydantic models
- A virtual environment manager you like (
uv,poetry, orvenv)
Installing Pydantic AI
Installation is straightforward. The base package includes everything you need, plus you pick provider extras as needed.
# Using uv (fastest)
uv add "pydantic-ai"
# Or with pip
pip install "pydantic-ai"
# Provider-specific installs (lighter than the meta-package)
pip install "pydantic-ai-slim[openai,anthropic]"
Then, set your provider API key as an environment variable. For OpenAI, export OPENAI_API_KEY. For Anthropic, export ANTHROPIC_API_KEY. Pydantic AI reads these automatically when you reference a model by name.
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
Your First Pydantic AI Agent
Here is the minimum viable Pydantic AI agent. Notice how the return type is declared with a Pydantic model rather than free-form text.
from pydantic import BaseModel
from pydantic_ai import Agent
class CityInfo(BaseModel):
name: str
country: str
population_estimate: int
famous_for: list[str]
agent = Agent(
"openai:gpt-4o-mini",
output_type=CityInfo,
system_prompt=(
"You are a geography assistant. Return concise, accurate facts. "
"If unsure about the population, give your best estimate."
),
)
async def main() -> None:
result = await agent.run("Tell me about Lisbon, Portugal.")
print(result.output)
# CityInfo(name='Lisbon', country='Portugal', population_estimate=550000,
# famous_for=['Fado music', 'Pastel de nata', 'Tram 28'])
Here is why this works: under the hood, Pydantic AI converts your CityInfo model into a JSON schema, passes it to the provider’s structured output mode (or function calling for providers without native structured outputs), and validates the response against the model before returning it. If validation fails, Pydantic AI automatically retries with a corrected prompt that includes the validation error.
For a comparison of structured output approaches across providers, the OpenAI structured outputs guide covers the underlying mechanism in more detail.
Adding Tools to a Pydantic AI Agent
Real agents need to interact with external systems. In Pydantic AI, tools are just decorated async functions whose parameters become the tool schema automatically.
from datetime import date
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import httpx
class WeatherDeps:
"""Dependencies injected into every agent run."""
def __init__(self, http: httpx.AsyncClient, api_key: str) -> None:
self.http = http
self.api_key = api_key
class WeatherReport(BaseModel):
location: str
temperature_celsius: float
conditions: str
advice: str = Field(description="One sentence of practical advice.")
weather_agent = Agent(
"anthropic:claude-sonnet-4-5",
deps_type=WeatherDeps,
output_type=WeatherReport,
system_prompt="You are a helpful weather assistant. Always call the weather tool.",
)
@weather_agent.tool
async def get_current_weather(
ctx: RunContext[WeatherDeps],
city: str,
country_code: str = "US",
) -> dict:
"""Fetch current weather for a city. Uses OpenWeatherMap."""
response = await ctx.deps.http.get(
"https://api.openweathermap.org/data/2.5/weather",
params={
"q": f"{city},{country_code}",
"appid": ctx.deps.api_key,
"units": "metric",
},
timeout=10.0,
)
response.raise_for_status()
return response.json()
async def run() -> None:
async with httpx.AsyncClient() as http:
deps = WeatherDeps(http=http, api_key="your-key")
result = await weather_agent.run(
"What's the weather in Porto today, and should I take a jacket?",
deps=deps,
)
print(result.output)
Why this design matters: dependencies are injected per run, not globally. As a result, tests can swap in mock HTTP clients without monkey-patching. The tool’s docstring becomes the LLM-facing description, and type hints become the tool schema. Pydantic AI validates the LLM’s chosen arguments against those hints before your function is even called.
Structured Outputs with Validation
Structured outputs are where Pydantic AI earns its name. Beyond simple BaseModel schemas, you can use validators to enforce business rules.
from pydantic import BaseModel, Field, field_validator
from pydantic_ai import Agent
class InvoiceLineItem(BaseModel):
description: str
quantity: int = Field(gt=0)
unit_price_cents: int = Field(ge=0)
total_cents: int = Field(ge=0)
@field_validator("total_cents")
@classmethod
def total_matches_qty_times_price(cls, v: int, info) -> int:
qty = info.data.get("quantity")
price = info.data.get("unit_price_cents")
if qty is not None and price is not None and v != qty * price:
raise ValueError(
f"total_cents ({v}) must equal quantity * unit_price_cents ({qty * price})"
)
return v
class ExtractedInvoice(BaseModel):
vendor_name: str
invoice_number: str
line_items: list[InvoiceLineItem]
invoice_agent = Agent(
"openai:gpt-4o",
output_type=ExtractedInvoice,
retries=2,
system_prompt="Extract structured invoice data from the provided text exactly.",
)
When the LLM returns a line item where total_cents does not equal quantity * unit_price_cents, Pydantic AI catches the validation error, feeds it back to the model, and asks for a corrected response. By default it retries twice. Most arithmetic mismatches resolve on the first retry.
Streaming Responses
For chat interfaces, streaming is non-negotiable. Pydantic AI streams structured outputs incrementally, validating each partial chunk against the target schema.
from pydantic import BaseModel
from pydantic_ai import Agent
class ArticleDraft(BaseModel):
title: str
summary: str
sections: list[str]
writer = Agent("openai:gpt-4o", output_type=ArticleDraft)
async def stream_draft(topic: str) -> None:
async with writer.run_stream(f"Write a draft article about: {topic}") as result:
async for partial in result.stream(debounce_by=0.05):
# `partial` is a partial ArticleDraft validated as far as possible
print(partial)
final: ArticleDraft = await result.get_output()
print(f"Final word count: {len(final.summary.split())}")
The debounce_by parameter throttles validation frequency, which matters when you push partial updates to a websocket. Furthermore, the framework only yields chunks that produce a valid partial state, so your UI never receives malformed JSON.
Production Pattern: FastAPI Integration
In a real backend, an agent is one endpoint among many. Here is a tested pattern for serving a Pydantic AI agent from FastAPI with dependency injection and proper lifecycle management.
from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends
from pydantic import BaseModel
from pydantic_ai import Agent
import httpx
class SupportQuery(BaseModel):
user_id: str
message: str
class SupportResponse(BaseModel):
answer: str
escalate_to_human: bool
suggested_articles: list[str]
class AppState:
http: httpx.AsyncClient
state = AppState()
@asynccontextmanager
async def lifespan(app: FastAPI):
state.http = httpx.AsyncClient(timeout=30.0)
yield
await state.http.aclose()
app = FastAPI(lifespan=lifespan)
support_agent = Agent(
"anthropic:claude-sonnet-4-5",
output_type=SupportResponse,
system_prompt=(
"You are a support agent. Escalate when unsure or when the user is frustrated."
),
)
def get_http() -> httpx.AsyncClient:
return state.http
@app.post("/support/answer", response_model=SupportResponse)
async def answer(
query: SupportQuery,
http: httpx.AsyncClient = Depends(get_http),
) -> SupportResponse:
result = await support_agent.run(
f"User {query.user_id} asks: {query.message}",
)
return result.output
This pattern keeps the agent definition module-level (so models are only loaded once), shares the HTTP client across requests, and lets FastAPI’s existing dependency system handle scope. The response_model parameter means FastAPI re-validates the agent output before returning it — belt and braces, but cheap.
Production Scenario: Migrating a Brittle JSON Parser
Consider a mid-sized SaaS team running an LLM-powered classification endpoint that handles roughly 100,000 requests per day. Initially, the team used the OpenAI SDK directly with a manually written JSON parser. Over several months, the on-call rotation typically reports recurring 500 errors when the model returns slightly malformed JSON — trailing commas, missing fields, occasionally an extra explanatory paragraph before the JSON.
Migrating to Pydantic AI usually surfaces a hidden truth: about 0.3 to 0.8 percent of requests were silently being retried or falling back to a default response. With Pydantic AI’s automatic validation retries, those failures convert into successful responses, while truly invalid attempts now raise typed exceptions that observability tooling can route correctly. The migration itself is mostly mechanical — replace the SDK call with an Agent instance, declare the Pydantic schema you were already trying to enforce, and remove the hand-rolled parser.
The trade-off is one extra dependency and a small overhead per request from JSON schema generation. For most teams, the reduction in on-call noise more than pays for it.
Error Handling and Retries
Pydantic AI raises a small set of typed exceptions you should handle explicitly in production.
from pydantic_ai import Agent
from pydantic_ai.exceptions import (
UnexpectedModelBehavior,
UsageLimitExceeded,
ModelRetry,
)
async def safe_run(agent: Agent, prompt: str) -> dict:
try:
result = await agent.run(prompt, usage_limits={"request_limit": 5})
return {"ok": True, "data": result.output.model_dump()}
except UsageLimitExceeded as e:
# Agent exceeded the request/token budget you configured
return {"ok": False, "error": "limit_exceeded", "detail": str(e)}
except UnexpectedModelBehavior as e:
# Model returned something that could not be validated after retries
return {"ok": False, "error": "model_misbehaved", "detail": str(e)}
A common production setup pairs this with a circuit breaker on the provider call and a fallback model. For instance, you can configure a fallback from openai:gpt-4o to openai:gpt-4o-mini when the primary times out, so user-facing latency stays bounded.
Observability with Logfire
Tracing matters once you have more than one agent. Pydantic AI integrates natively with Logfire, which the Pydantic team maintains, and emits OpenTelemetry traces that any OTel-compatible backend (Honeycomb, Datadog, Grafana Tempo) can ingest.
import logfire
from pydantic_ai import Agent
logfire.configure(token="your-logfire-token")
logfire.instrument_pydantic_ai()
logfire.instrument_httpx() # Captures provider HTTP calls
agent = Agent("openai:gpt-4o", system_prompt="...")
# Every run is now traced: prompts, tool calls, validation retries, latency.
For teams that already self-host observability infrastructure, the OTel exporter sends the same data anywhere. The traces include token usage, retry counts, and tool call arguments, which makes debugging “why did the agent loop three times” straightforward.
When to Use Pydantic AI
- You build Python backends and already lean on Pydantic for request and response models
- You need typed, validated LLM outputs (extraction, classification, structured generation)
- You want provider portability without a heavy orchestration framework
- You serve agents from FastAPI, Litestar, or another async Python framework
- You care about observability and want OpenTelemetry traces out of the box
When NOT to Use Pydantic AI
- You need complex multi-agent orchestration with branching workflows — look at LangGraph for stateful cyclic agents instead
- Your team is on TypeScript or Node.js — Pydantic AI is Python-only
- You need a no-code or low-code visual builder (Pydantic AI is code-first)
- You require deep integration with a specific framework’s chain abstractions that Pydantic AI does not model
Common Mistakes with Pydantic AI
- Defining the agent inside a request handler, which reloads schemas on every call and slows things down — define agents at module scope
- Forgetting to set
usage_limitsin production, leading to runaway token spend during retry loops - Using overly strict validators that the model cannot satisfy, causing repeated retries that exhaust the limit
- Mixing sync and async incorrectly — Pydantic AI is async-first, so call
runnotrun_syncinside FastAPI endpoints - Treating tool docstrings as documentation rather than as part of the LLM prompt — bad docstrings produce bad tool calls
Pydantic AI vs Other Python AI Frameworks
| Feature | Pydantic AI | LangChain | LlamaIndex | Plain SDK |
|---|---|---|---|---|
| Type-safe outputs | Native | Add-on | Limited | None |
| Provider portability | Yes | Yes | Yes | No |
| Async-first | Yes | Partial | Partial | Varies |
| Learning curve | Low | High | Medium | Lowest |
| Multi-agent orchestration | Basic | Strong (LangGraph) | Medium | Manual |
| Built-in tracing | Logfire/OTel | LangSmith | Custom | None |
For broader context on multi-agent options, see the rundown on building AI agents with tools, planning, and execution and the Microsoft AutoGen multi-agent framework guide.
Testing Pydantic AI Agents
Because dependencies inject cleanly, testing is mostly about swapping the model for a test double. Pydantic AI ships a TestModel and FunctionModel for exactly this.
from pydantic_ai.models.test import TestModel
from pydantic_ai import Agent
from pydantic import BaseModel
class Sentiment(BaseModel):
label: str
confidence: float
sentiment_agent = Agent("openai:gpt-4o", output_type=Sentiment)
async def test_returns_sentiment():
# Override the live model with a deterministic test model.
with sentiment_agent.override(model=TestModel()):
result = await sentiment_agent.run("I love this product.")
assert isinstance(result.output, Sentiment)
assert 0.0 <= result.output.confidence <= 1.0
TestModel introspects your output schema and generates a plausible response — useful for fast CI tests without burning API credits. For deterministic assertions about specific values, use FunctionModel to return a fixed response.
Conclusion
Pydantic AI is the right choice when you want LLM agents that behave like the rest of your Python codebase — typed, validated, and testable — without adopting a heavy orchestration framework. Its tight integration with Pydantic v2, async-first design, and provider portability make it especially attractive for FastAPI-based backends.
To get started today, pick one existing endpoint where you parse LLM JSON by hand, replace the parser with a Pydantic AI agent and a typed output schema, and watch your error logs go quiet. From there, explore LangGraph for stateful cyclic agents when you need multi-step workflows, or compare against the broader landscape in building AI agents with tools, planning, and execution.