Pydantic AI: Type-Safe Agents for Production Python

If you have ever shipped an LLM-powered feature in Python, you have probably hit the same wall: the model returns a JSON blob, you parse it, something is missing or the wrong type, and your app crashes in production. Pydantic AI is a Python agent framework built by the team behind Pydantic that treats this problem as a first-class concern. It uses Pydantic models to enforce LLM outputs, validates tool calls before they hit your code, and stays close to the FastAPI-style developer experience most Python backend engineers already know.

This guide walks intermediate Python developers through building production-grade agents with Pydantic AI. By the end, you will know how to set it up, define typed agents with structured outputs and tools, handle streaming and validation errors, and decide whether Pydantic AI is the right pick for your project.

What Is Pydantic AI?

Pydantic AI is an open-source Python framework for building LLM agents that enforces strict type safety on inputs, outputs, and tool calls using Pydantic v2 models. Released by the Pydantic team in late 2024, it wraps providers like OpenAI, Anthropic, Gemini, Groq, and Ollama behind a single typed interface, so you write your business logic against Pydantic schemas instead of raw JSON. The result is fewer runtime surprises and an agent layer that integrates cleanly with FastAPI, SQLAlchemy, and standard observability tools.

The framework deliberately avoids the heavy abstractions of larger orchestration libraries. Instead, it focuses on three concrete guarantees: outputs match a declared Pydantic model, tools are validated before execution, and dependency injection makes testing painless.

Why Pydantic AI Over LangChain or Plain SDKs?

Most Python AI stacks fall into two camps. On one side, you have raw provider SDKs where you parse JSON by hand and pray. On the other side, you have heavy frameworks like LangChain that bring orchestration but also bring chains, callbacks, and abstractions that often outlast their usefulness. Pydantic AI sits between them.

Specifically, Pydantic AI gives you:

Compile-time-ish safety through Pydantic models — your IDE and mypy know what an agent returns
Provider-agnostic code that works across OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama, and others
Native async support that plays well with FastAPI, anyio, and standard Python async patterns
Built-in observability via Logfire (the Pydantic team’s tracing tool) with OpenTelemetry compatibility
Streaming with type validation so partial responses still respect your schema

If you already use Pydantic for request validation in advanced Pydantic validation in FastAPI, the mental model carries over directly.

Prerequisites

Before installing Pydantic AI, make sure you have:

Python 3.10 or newer (Pydantic AI uses modern type syntax)
An API key for at least one model provider (OpenAI, Anthropic, or similar)
Working knowledge of Pydantic v2 — if you need a refresher, see dataclasses vs Pydantic models
A virtual environment manager you like (uv, poetry, or venv)

Installing Pydantic AI

Installation is straightforward. The base package includes everything you need, plus you pick provider extras as needed.

# Using uv (fastest)
uv add "pydantic-ai"

# Or with pip
pip install "pydantic-ai"

# Provider-specific installs (lighter than the meta-package)
pip install "pydantic-ai-slim[openai,anthropic]"

Then, set your provider API key as an environment variable. For OpenAI, export OPENAI_API_KEY. For Anthropic, export ANTHROPIC_API_KEY. Pydantic AI reads these automatically when you reference a model by name.

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

Your First Pydantic AI Agent

Here is the minimum viable Pydantic AI agent. Notice how the return type is declared with a Pydantic model rather than free-form text.

from pydantic import BaseModel
from pydantic_ai import Agent


class CityInfo(BaseModel):
    name: str
    country: str
    population_estimate: int
    famous_for: list[str]


agent = Agent(
    "openai:gpt-4o-mini",
    output_type=CityInfo,
    system_prompt=(
        "You are a geography assistant. Return concise, accurate facts. "
        "If unsure about the population, give your best estimate."
    ),
)


async def main() -> None:
    result = await agent.run("Tell me about Lisbon, Portugal.")
    print(result.output)
    # CityInfo(name='Lisbon', country='Portugal', population_estimate=550000,
    #          famous_for=['Fado music', 'Pastel de nata', 'Tram 28'])

Here is why this works: under the hood, Pydantic AI converts your CityInfo model into a JSON schema, passes it to the provider’s structured output mode (or function calling for providers without native structured outputs), and validates the response against the model before returning it. If validation fails, Pydantic AI automatically retries with a corrected prompt that includes the validation error.

For a comparison of structured output approaches across providers, the OpenAI structured outputs guide covers the underlying mechanism in more detail.

Adding Tools to a Pydantic AI Agent

Real agents need to interact with external systems. In Pydantic AI, tools are just decorated async functions whose parameters become the tool schema automatically.

from datetime import date
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import httpx


class WeatherDeps:
    """Dependencies injected into every agent run."""
    def __init__(self, http: httpx.AsyncClient, api_key: str) -> None:
        self.http = http
        self.api_key = api_key


class WeatherReport(BaseModel):
    location: str
    temperature_celsius: float
    conditions: str
    advice: str = Field(description="One sentence of practical advice.")


weather_agent = Agent(
    "anthropic:claude-sonnet-4-5",
    deps_type=WeatherDeps,
    output_type=WeatherReport,
    system_prompt="You are a helpful weather assistant. Always call the weather tool.",
)


@weather_agent.tool
async def get_current_weather(
    ctx: RunContext[WeatherDeps],
    city: str,
    country_code: str = "US",
) -> dict:
    """Fetch current weather for a city. Uses OpenWeatherMap."""
    response = await ctx.deps.http.get(
        "https://api.openweathermap.org/data/2.5/weather",
        params={
            "q": f"{city},{country_code}",
            "appid": ctx.deps.api_key,
            "units": "metric",
        },
        timeout=10.0,
    )
    response.raise_for_status()
    return response.json()


async def run() -> None:
    async with httpx.AsyncClient() as http:
        deps = WeatherDeps(http=http, api_key="your-key")
        result = await weather_agent.run(
            "What's the weather in Porto today, and should I take a jacket?",
            deps=deps,
        )
        print(result.output)

Why this design matters: dependencies are injected per run, not globally. As a result, tests can swap in mock HTTP clients without monkey-patching. The tool’s docstring becomes the LLM-facing description, and type hints become the tool schema. Pydantic AI validates the LLM’s chosen arguments against those hints before your function is even called.

Structured Outputs with Validation

Structured outputs are where Pydantic AI earns its name. Beyond simple BaseModel schemas, you can use validators to enforce business rules.

from pydantic import BaseModel, Field, field_validator
from pydantic_ai import Agent


class InvoiceLineItem(BaseModel):
    description: str
    quantity: int = Field(gt=0)
    unit_price_cents: int = Field(ge=0)
    total_cents: int = Field(ge=0)

    @field_validator("total_cents")
    @classmethod
    def total_matches_qty_times_price(cls, v: int, info) -> int:
        qty = info.data.get("quantity")
        price = info.data.get("unit_price_cents")
        if qty is not None and price is not None and v != qty * price:
            raise ValueError(
                f"total_cents ({v}) must equal quantity * unit_price_cents ({qty * price})"
            )
        return v


class ExtractedInvoice(BaseModel):
    vendor_name: str
    invoice_number: str
    line_items: list[InvoiceLineItem]


invoice_agent = Agent(
    "openai:gpt-4o",
    output_type=ExtractedInvoice,
    retries=2,
    system_prompt="Extract structured invoice data from the provided text exactly.",
)

When the LLM returns a line item where total_cents does not equal quantity * unit_price_cents, Pydantic AI catches the validation error, feeds it back to the model, and asks for a corrected response. By default it retries twice. Most arithmetic mismatches resolve on the first retry.

Streaming Responses

For chat interfaces, streaming is non-negotiable. Pydantic AI streams structured outputs incrementally, validating each partial chunk against the target schema.

from pydantic import BaseModel
from pydantic_ai import Agent


class ArticleDraft(BaseModel):
    title: str
    summary: str
    sections: list[str]


writer = Agent("openai:gpt-4o", output_type=ArticleDraft)


async def stream_draft(topic: str) -> None:
    async with writer.run_stream(f"Write a draft article about: {topic}") as result:
        async for partial in result.stream(debounce_by=0.05):
            # `partial` is a partial ArticleDraft validated as far as possible
            print(partial)
        final: ArticleDraft = await result.get_output()
        print(f"Final word count: {len(final.summary.split())}")

The debounce_by parameter throttles validation frequency, which matters when you push partial updates to a websocket. Furthermore, the framework only yields chunks that produce a valid partial state, so your UI never receives malformed JSON.

Production Pattern: FastAPI Integration

In a real backend, an agent is one endpoint among many. Here is a tested pattern for serving a Pydantic AI agent from FastAPI with dependency injection and proper lifecycle management.

from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends
from pydantic import BaseModel
from pydantic_ai import Agent
import httpx


class SupportQuery(BaseModel):
    user_id: str
    message: str


class SupportResponse(BaseModel):
    answer: str
    escalate_to_human: bool
    suggested_articles: list[str]


class AppState:
    http: httpx.AsyncClient


state = AppState()


@asynccontextmanager
async def lifespan(app: FastAPI):
    state.http = httpx.AsyncClient(timeout=30.0)
    yield
    await state.http.aclose()


app = FastAPI(lifespan=lifespan)

support_agent = Agent(
    "anthropic:claude-sonnet-4-5",
    output_type=SupportResponse,
    system_prompt=(
        "You are a support agent. Escalate when unsure or when the user is frustrated."
    ),
)


def get_http() -> httpx.AsyncClient:
    return state.http


@app.post("/support/answer", response_model=SupportResponse)
async def answer(
    query: SupportQuery,
    http: httpx.AsyncClient = Depends(get_http),
) -> SupportResponse:
    result = await support_agent.run(
        f"User {query.user_id} asks: {query.message}",
    )
    return result.output

This pattern keeps the agent definition module-level (so models are only loaded once), shares the HTTP client across requests, and lets FastAPI’s existing dependency system handle scope. The response_model parameter means FastAPI re-validates the agent output before returning it — belt and braces, but cheap.

Production Scenario: Migrating a Brittle JSON Parser

Consider a mid-sized SaaS team running an LLM-powered classification endpoint that handles roughly 100,000 requests per day. Initially, the team used the OpenAI SDK directly with a manually written JSON parser. Over several months, the on-call rotation typically reports recurring 500 errors when the model returns slightly malformed JSON — trailing commas, missing fields, occasionally an extra explanatory paragraph before the JSON.

Migrating to Pydantic AI usually surfaces a hidden truth: about 0.3 to 0.8 percent of requests were silently being retried or falling back to a default response. With Pydantic AI’s automatic validation retries, those failures convert into successful responses, while truly invalid attempts now raise typed exceptions that observability tooling can route correctly. The migration itself is mostly mechanical — replace the SDK call with an Agent instance, declare the Pydantic schema you were already trying to enforce, and remove the hand-rolled parser.

The trade-off is one extra dependency and a small overhead per request from JSON schema generation. For most teams, the reduction in on-call noise more than pays for it.

Error Handling and Retries

Pydantic AI raises a small set of typed exceptions you should handle explicitly in production.

from pydantic_ai import Agent
from pydantic_ai.exceptions import (
    UnexpectedModelBehavior,
    UsageLimitExceeded,
    ModelRetry,
)


async def safe_run(agent: Agent, prompt: str) -> dict:
    try:
        result = await agent.run(prompt, usage_limits={"request_limit": 5})
        return {"ok": True, "data": result.output.model_dump()}
    except UsageLimitExceeded as e:
        # Agent exceeded the request/token budget you configured
        return {"ok": False, "error": "limit_exceeded", "detail": str(e)}
    except UnexpectedModelBehavior as e:
        # Model returned something that could not be validated after retries
        return {"ok": False, "error": "model_misbehaved", "detail": str(e)}

A common production setup pairs this with a circuit breaker on the provider call and a fallback model. For instance, you can configure a fallback from openai:gpt-4o to openai:gpt-4o-mini when the primary times out, so user-facing latency stays bounded.

Observability with Logfire

Tracing matters once you have more than one agent. Pydantic AI integrates natively with Logfire, which the Pydantic team maintains, and emits OpenTelemetry traces that any OTel-compatible backend (Honeycomb, Datadog, Grafana Tempo) can ingest.

import logfire
from pydantic_ai import Agent


logfire.configure(token="your-logfire-token")
logfire.instrument_pydantic_ai()
logfire.instrument_httpx()  # Captures provider HTTP calls

agent = Agent("openai:gpt-4o", system_prompt="...")
# Every run is now traced: prompts, tool calls, validation retries, latency.

For teams that already self-host observability infrastructure, the OTel exporter sends the same data anywhere. The traces include token usage, retry counts, and tool call arguments, which makes debugging “why did the agent loop three times” straightforward.

When to Use Pydantic AI

You build Python backends and already lean on Pydantic for request and response models
You need typed, validated LLM outputs (extraction, classification, structured generation)
You want provider portability without a heavy orchestration framework
You serve agents from FastAPI, Litestar, or another async Python framework
You care about observability and want OpenTelemetry traces out of the box

When NOT to Use Pydantic AI

You need complex multi-agent orchestration with branching workflows — look at LangGraph for stateful cyclic agents instead
Your team is on TypeScript or Node.js — Pydantic AI is Python-only
You need a no-code or low-code visual builder (Pydantic AI is code-first)
You require deep integration with a specific framework’s chain abstractions that Pydantic AI does not model

Common Mistakes with Pydantic AI

Defining the agent inside a request handler, which reloads schemas on every call and slows things down — define agents at module scope
Forgetting to set usage_limits in production, leading to runaway token spend during retry loops
Using overly strict validators that the model cannot satisfy, causing repeated retries that exhaust the limit
Mixing sync and async incorrectly — Pydantic AI is async-first, so call run not run_sync inside FastAPI endpoints
Treating tool docstrings as documentation rather than as part of the LLM prompt — bad docstrings produce bad tool calls

Pydantic AI vs Other Python AI Frameworks

Feature	Pydantic AI	LangChain	LlamaIndex	Plain SDK
Type-safe outputs	Native	Add-on	Limited	None
Provider portability	Yes	Yes	Yes	No
Async-first	Yes	Partial	Partial	Varies
Learning curve	Low	High	Medium	Lowest
Multi-agent orchestration	Basic	Strong (LangGraph)	Medium	Manual
Built-in tracing	Logfire/OTel	LangSmith	Custom	None

For broader context on multi-agent options, see the rundown on building AI agents with tools, planning, and execution and the Microsoft AutoGen multi-agent framework guide.

Testing Pydantic AI Agents

Because dependencies inject cleanly, testing is mostly about swapping the model for a test double. Pydantic AI ships a TestModel and FunctionModel for exactly this.

from pydantic_ai.models.test import TestModel
from pydantic_ai import Agent
from pydantic import BaseModel


class Sentiment(BaseModel):
    label: str
    confidence: float


sentiment_agent = Agent("openai:gpt-4o", output_type=Sentiment)


async def test_returns_sentiment():
    # Override the live model with a deterministic test model.
    with sentiment_agent.override(model=TestModel()):
        result = await sentiment_agent.run("I love this product.")
        assert isinstance(result.output, Sentiment)
        assert 0.0 <= result.output.confidence <= 1.0

TestModel introspects your output schema and generates a plausible response — useful for fast CI tests without burning API credits. For deterministic assertions about specific values, use FunctionModel to return a fixed response.

Conclusion

Pydantic AI is the right choice when you want LLM agents that behave like the rest of your Python codebase — typed, validated, and testable — without adopting a heavy orchestration framework. Its tight integration with Pydantic v2, async-first design, and provider portability make it especially attractive for FastAPI-based backends.

To get started today, pick one existing endpoint where you parse LLM JSON by hand, replace the parser with a Pydantic AI agent and a typed output schema, and watch your error logs go quiet. From there, explore LangGraph for stateful cyclic agents when you need multi-step workflows, or compare against the broader landscape in building AI agents with tools, planning, and execution.