AI Agents & Frameworks

Vercel AI SDK: Build Streaming Chat UIs in Next.js 15

If you have ever built a chatbot that waits ten seconds, then dumps a wall of text, you already know why streaming matters. The Vercel AI SDK fixes that for Next.js apps with a small, opinionated API that handles token-by-token streaming, tool calls, abort signals, and React state for you. This guide walks through Vercel AI SDK streaming end-to-end, from the API route to a production-ready chat UI, with code you can paste into a fresh Next.js 15 project today.

This is a tutorial for intermediate React and Next.js developers who have built a feature with the App Router but have not yet wired up a real LLM. By the end, you will have a working streaming chat, tool calls that hit your own data, error handling that does not break the UI, and a clear picture of when the SDK is the right choice versus calling a provider directly.

What Is the Vercel AI SDK?

The Vercel AI SDK is a TypeScript toolkit for building AI features in React, Next.js, Svelte, Vue, and Node. It has two layers. The ai package gives you provider-agnostic functions like streamTextgenerateText, and streamObject that work the same whether you call OpenAI, Anthropic, Google, or a self-hosted model. The @ai-sdk/react package gives you hooks like useChat and useCompletion that wire those streams into React state without you touching ReadableStream or EventSource.

The SDK is maintained by Vercel and is the default integration for AI features on the Vercel platform. It is, however, framework-only. The runtime works anywhere Node or the edge runtime works. You do not need to deploy to Vercel to use it.

Internally, the SDK uses a streaming protocol called the AI SDK Data Stream Protocol. Tokens, tool calls, tool results, and finish events all travel over a single HTTP response. React reads them and updates state incrementally. That is what gives you the ChatGPT-style typing effect without managing low-level transport.

Why Streaming Matters for Chat UIs

Time to first token usually sits between 200ms and 1.5s. Time to last token, for a 500-token response, can easily reach 8 to 15 seconds. If you wait for the full response, your user stares at a spinner for ten seconds. If you stream, they see the first words in under a second and read along while the model finishes. Perceived latency drops by an order of magnitude even though total latency is identical.

Streaming also lets you abort early. A user who reads the first sentence and realizes the answer is wrong can hit stop. You cancel the underlying request, stop billing for unused tokens, and free a slot for the next message. None of that is possible with a blocking call.

For a deeper look at the underlying transport choices, see our guide on streaming LLM responses with SSE vs WebSockets, which compares the two protocols the SDK can use under the hood.

Prerequisites

Before you start, make sure you have:

  • Node.js 18.17 or newer (Next.js 15 requires it)
  • A Next.js 15 project using the App Router
  • An OpenAI API key, an Anthropic API key, or a Google AI Studio key
  • Basic familiarity with React Server Components and Route Handlers

If you are still on the Pages Router, the SDK works there too, but the route file structure differs. The walkthrough below uses the App Router.

Step 1: Install the Vercel AI SDK

Install the core SDK, the React bindings, and a provider. This example uses OpenAI, but swapping providers is a one-line change later.

npm install ai @ai-sdk/react @ai-sdk/openai zod

The zod package is optional for plain chat, but you will need it the moment you add tool calls or structured output. Install it now to avoid a second dependency step later.

Add your API key to .env.local:

OPENAI_API_KEY=sk-...

Next.js auto-loads .env.local into process.env. Never check this file into git. Add it to .gitignore if your project template did not already do that.

Step 2: Create the API Route With streamText

The API route runs server-side. It receives messages from the client, calls the model with streaming enabled, and pipes the response back. With the SDK, this is about ten lines.

Create app/api/chat/route.ts:

import { openai } from '@ai-sdk/openai';
import { streamText, convertToCoreMessages } from 'ai';

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: 'You are a helpful assistant for a developer documentation site.',
    messages: convertToCoreMessages(messages),
  });

  return result.toDataStreamResponse();
}

A few things are happening here that are worth understanding before you copy this into a real app.

The streamText function returns immediately with a result object. The model call is in flight, but tokens have not arrived yet. Calling toDataStreamResponse() returns a Response whose body is a ReadableStream. Next.js sends the response headers right away, then streams chunks as the SDK reads them from the provider. The client never sees a buffered response.

The convertToCoreMessages helper normalizes the message format the client hook sends into the shape streamText expects. Without it, you will hit type errors the moment you add attachments or tool calls.

The maxDuration = 30 export tells Vercel’s serverless platform to allow up to 30 seconds. On the Edge runtime, the default cap is lower. If your responses can run longer, raise this value. On other platforms, this constant is ignored.

You will hit two common errors at this step. If you see OPENAI_API_KEY is not defined, restart the dev server after editing .env.local. Next.js only reads env files at startup. If you see model is not a function, check that you imported openai from @ai-sdk/openai and not from the openai package itself. They are different.

Step 3: Build the Chat UI With useChat

The useChat hook handles everything on the client: message state, input state, submission, streaming updates, and abort. You write the markup and the hook does the rest.

Create app/chat/page.tsx:

'use client';

import { useChat } from '@ai-sdk/react';

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, status, stop } =
    useChat({ api: '/api/chat' });

  return (
    <main className="mx-auto flex h-screen max-w-2xl flex-col p-4">
      <div className="flex-1 space-y-4 overflow-y-auto">
        {messages.map((m) => (
          <div key={m.id} className="rounded-lg border p-3">
            <div className="text-xs uppercase text-gray-500">{m.role}</div>
            <div className="whitespace-pre-wrap">{m.content}</div>
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit} className="mt-4 flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="flex-1 rounded-lg border px-3 py-2"
          disabled={status !== 'ready'}
        />
        {status === 'streaming' ? (
          <button type="button" onClick={stop} className="rounded-lg bg-red-500 px-4 py-2 text-white">
            Stop
          </button>
        ) : (
          <button type="submit" className="rounded-lg bg-blue-500 px-4 py-2 text-white">
            Send
          </button>
        )}
      </form>
    </main>
  );
}

The status field replaced the older isLoading boolean in recent SDK versions. It has four states: readysubmittedstreaming, and error. Using status lets you show different UI for “request sent, waiting for first token” versus “tokens arriving now,” which matters for a polished feel. A simple boolean cannot express that distinction.

The stop function cancels the underlying fetch. The SDK propagates the abort to the server route, and the OpenAI client cancels the upstream request. You pay only for tokens already generated. That is the kind of detail that takes a half day to build manually.

Run npm run dev, open http://localhost:3000/chat, and type a question. You should see tokens stream in within a second.

Step 4: Add Tool Calling With Structured Inputs

Plain chat is rarely enough in production. Real assistants need to look up data, hit internal APIs, or perform actions. Tool calling lets the model decide when to call a function you defined, with arguments it generates from the conversation.

The SDK uses Zod schemas to describe tools. The model receives the schema as JSON Schema, decides whether to call the tool, returns arguments, and your code runs the function. The result goes back to the model, which then produces the final user-facing answer.

Update app/api/chat/route.ts:

import { openai } from '@ai-sdk/openai';
import { streamText, convertToCoreMessages, tool } from 'ai';
import { z } from 'zod';

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: 'You help users find blog posts on TeachMeIDEA.',
    messages: convertToCoreMessages(messages),
    tools: {
      searchPosts: tool({
        description: 'Search published blog posts by keyword',
        parameters: z.object({
          query: z.string().describe('Search query, 1-5 words'),
          limit: z.number().min(1).max(10).default(5),
        }),
        execute: async ({ query, limit }) => {
          const results = await searchPostsInDb(query, limit);
          return results.map((p) => ({
            title: p.title,
            url: p.url,
            excerpt: p.excerpt,
          }));
        },
      }),
    },
    maxSteps: 3,
  });

  return result.toDataStreamResponse();
}

async function searchPostsInDb(query: string, limit: number) {
  return [];
}

The maxSteps option is critical and easy to miss. Without it, the model calls your tool, gets the result, and then stops. The user sees the tool call but no final answer because the model never produced one. Setting maxSteps: 3 tells the SDK to loop: model calls tool, tool returns, model generates answer. For multi-step agent flows, raise this number.

Why does the model decide to call the tool? Because the description and parameter describe() calls become part of the system prompt automatically. The clearer your descriptions, the more reliably the model picks the right tool. Treat tool descriptions like docstrings for production code, not throwaway labels.

The execute function runs server-side, so you can hit your database, call internal APIs, or read files without exposing credentials to the browser. If execute throws, the SDK catches the error and reports it to the model, which can decide to retry, apologize, or try a different tool. You do not need to wrap it in try/catch unless you want custom error handling.

For more on schema-driven LLM patterns, see our OpenAI structured outputs guide, which covers the same idea applied to the response itself rather than tool arguments.

Step 5: Handle Errors, Aborts, and Rate Limits

A chat UI that breaks on the first rate limit error feels amateurish. The SDK gives you hooks to handle every failure cleanly, but you have to wire them up.

On the client, the useChat hook exposes errorreload, and lifecycle callbacks. Update the chat page:

'use client';

import { useChat } from '@ai-sdk/react';

export default function ChatPage() {
  const {
    messages, input, handleInputChange, handleSubmit,
    status, stop, error, reload,
  } = useChat({
    api: '/api/chat',
    onError: (err) => {
      console.error('Chat error:', err);
    },
    onFinish: (message) => {
      // Persist to your database here
    },
  });

  return (
    <main className="mx-auto flex h-screen max-w-2xl flex-col p-4">
      <div className="flex-1 space-y-4 overflow-y-auto">
        {messages.map((m) => (
          <div key={m.id} className="rounded-lg border p-3">
            <div className="text-xs uppercase text-gray-500">{m.role}</div>
            <div className="whitespace-pre-wrap">{m.content}</div>
          </div>
        ))}

        {error && (
          <div className="rounded-lg border border-red-300 bg-red-50 p-3">
            <div className="text-sm text-red-700">Something went wrong.</div>
            <button onClick={() => reload()} className="mt-2 text-sm underline">
              Retry
            </button>
          </div>
        )}
      </div>

      {/* form unchanged */}
    </main>
  );
}

The reload function re-runs the last assistant message generation using the same messages array. It is the right call for transient failures: rate limits, network blips, and 500 errors. Do not call it automatically in onError, or you will hammer the provider during real outages.

On the server, errors from the provider become exceptions in the stream. The SDK does not crash the route; it sends a final error event the client can read. If you need to log structured errors, wrap the call:

const result = streamText({
  // ...
  onError: ({ error }) => {
    console.error('streamText error:', error);
  },
  onFinish: ({ usage, finishReason }) => {
    console.log('usage:', usage, 'reason:', finishReason);
  },
});

The onFinish callback gives you token counts. Use this to record cost per conversation. The usage object includes promptTokens and completionTokens. Multiply by your provider’s pricing and you have per-message cost tracking with no extra dependencies.

For rate limit protection, add a middleware or a server-side check before calling streamText. The SDK does not retry rate-limited requests automatically. If you want exponential backoff, wrap the call yourself or use a provider client that supports it. Our guide on API rate limiting strategies covers the patterns that apply here.

When to Use the Vercel AI SDK

  • You are building in React, Next.js, Svelte, Vue, or Nuxt and want a streaming chat UI without writing transport code
  • You want to swap LLM providers (OpenAI, Anthropic, Google, Mistral, Bedrock) without rewriting the app
  • You need tool calling with type-safe Zod schemas instead of hand-written JSON Schema
  • You want client-side hooks that handle abort, retry, and state out of the box
  • You are deploying to Vercel, Cloudflare Workers, or any edge runtime and need a streaming-friendly library

When NOT to Use the Vercel AI SDK

  • You are building a Python or Go backend and have no React frontend — use the provider SDK directly or LangChain
  • You need fine-grained control over the streaming protocol (custom event names, custom framing) — the SDK’s protocol is opinionated
  • Your use case is a single non-streaming call (one-shot generation, batch processing) — the simpler provider SDK is enough
  • You need multi-agent orchestration with persistent state across turns — look at LangGraph or Mastra instead
  • You depend on a provider not yet supported by the SDK and need bleeding-edge features the official client exposes

Common Mistakes With the Vercel AI SDK

  • Forgetting convertToCoreMessages — the client sends Message objects, but streamText expects CoreMessage. Skip the conversion and tool calls silently break.
  • Setting maxSteps: 1 with tools — the model calls the tool, gets nothing back, and the user sees a blank response. Always set maxSteps to at least 2 when tools are involved.
  • Hardcoding the API key in client code — the SDK has server and client packages for a reason. Never import @ai-sdk/openai in a 'use client' file. Always call the model from a Route Handler.
  • Ignoring status and only checking isLoading — the deprecated boolean does not distinguish between “submitted” and “streaming.” Your UI will flicker. Use status instead.
  • Streaming without rate limits — every public chat endpoint should sit behind an IP-based rate limiter. Without one, a single bad actor can bankrupt your OpenAI account in an afternoon.
  • Persisting messages only onFinish — if the user closes the tab mid-stream, you lose the partial response. Save the user message immediately and update the assistant message on finish.

Real-World Scenario: A Developer Documentation Assistant

A small SaaS team builds an in-app help bot for their developer docs. They have around 200 markdown pages indexed in Postgres with pgvector. They want users to ask questions and get answers in under three seconds with source links.

The first version used a Python FastAPI backend that buffered the full response before returning. Time to first byte averaged 6 seconds. Users assumed the app was broken and hit refresh, doubling their token spend on retries.

The team rebuilt the endpoint in Next.js using the Vercel AI SDK. The Route Handler ran streamText with a searchDocs tool that queried pgvector and returned the top 5 chunks. They set maxSteps: 3 so the model could retrieve, reason, and answer. Time to first token dropped to 800ms. Users got a visible response in under a second and stopped retrying.

The trade-offs they accepted: the SDK locks them into a TypeScript backend (a constraint, since most of the team writes Go for everything else), and the Data Stream Protocol is Vercel-flavored, so if they ever swap to a non-React frontend, they need a translation layer. For a feature scoped to one product surface, they judged that acceptable.

The biggest lesson: the streaming UX, more than any model upgrade, changed how users perceived the assistant. The same gpt-4o-mini that felt slow in v1 felt fast in v2 because tokens were visible immediately. This is the pattern Vercel AI SDK streaming makes easy that few other libraries do.

Conclusion

The Vercel AI SDK collapses what used to be a week of streaming infrastructure into a single afternoon. You write a Route Handler with streamText, drop useChat into a client component, and the SDK handles transport, state, abort, and tools. Pair it with proper error handling, rate limits, and tool design, and you have a chat UI that holds up in production.

Start by getting the basic streaming example running. Once that works, add one tool, then add error handling, then add rate limiting. Build incrementally and you will avoid the failure mode of trying to wire everything at once. Next, take a look at our Claude tool use guide to compare how the same tool-calling pattern looks against Anthropic’s API directly, and our LangGraph stateful agents post for cases where a single SDK call is not enough.

Leave a Comment