
If you have built an LLM agent with a basic while loop and a list of tools, you have probably hit the same wall everyone hits: the loop becomes a tangle of conditionals, you cannot pause execution, retries lose context, and debugging means re-running the entire flow. LangGraph solves this by modeling your agent as a graph of nodes, edges, and shared state. As a result, you get checkpointing, cycles, streaming, and human-in-the-loop control without writing scaffolding for any of it. This tutorial walks through LangGraph in Python from first install to a production-shaped agent. By the end, you will know when to reach for it and when a simpler approach beats it.
What Is LangGraph?
LangGraph is a Python library from the LangChain team for building stateful, cyclic AI agents as directed graphs. Each node is a function that mutates a shared state object, edges define the control flow between nodes, and conditional edges let the graph loop back on itself until a stop condition is met. Unlike a linear chain, the graph supports cycles, persistence, and partial replays out of the box.
In practice, the value shows up in three places. First, state is a single Python object (typically a TypedDict) that every node reads and writes, which removes the need to thread variables through function calls. Next, cycles allow an agent to call a tool, evaluate the result, and call another tool repeatedly without unbounded recursion. Finally, checkpointers persist state after every node, so a failed run resumes from the last successful step instead of starting over.
LangGraph sits on top of LangChain rather than replacing it. If you already use LangChain for retrievers, document loaders, or prompt templates, those still work. The graph layer is purely about orchestration. For background on the broader ecosystem, see our LangChain fundamentals guide before going deeper here.
Core Concepts: State, Nodes, and Edges
LangGraph has only three primitives to learn. Once these click, the rest of the API is a thin layer of helpers.
State
State is a TypedDict (or a Pydantic model) that defines the shape of the data flowing through the graph. Each node receives the current state and returns a partial update. Importantly, LangGraph merges updates rather than overwriting them, which means you can declare reducer functions on individual fields:
from typing import TypedDict, Annotated
from operator import add
class AgentState(TypedDict):
messages: Annotated[list, add] # Append, do not overwrite
iteration: int
final_answer: str
The Annotated[list, add] pattern is what makes message history work cleanly. Each node returns {"messages": [new_message]}, and LangGraph appends instead of replacing. Without the reducer, every node would clobber the prior messages.
Nodes
A node is a Python function (sync or async) that takes the state and returns a dict. Nothing is special about it — you can call an LLM, hit a database, run a tool, or apply a transformation:
def call_model(state: AgentState) -> dict:
response = llm.invoke(state["messages"])
return {"messages": [response], "iteration": state["iteration"] + 1}
Edges
Edges connect nodes. A static edge always routes from A to B. A conditional edge picks the next node based on the current state, which is where cycles come from. The graph also has two special markers: START (the entry point) and END (terminate execution).
Installing LangGraph and Building Your First Graph
LangGraph installs as a standard Python package. Additionally, you will want an LLM provider library — this tutorial uses Anthropic, but OpenAI or any LangChain-compatible model works identically.
pip install langgraph langchain-anthropic
export ANTHROPIC_API_KEY="your-key-here"
A minimal graph with one node and one edge looks like this:
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langchain_anthropic import ChatAnthropic
class State(TypedDict):
question: str
answer: str
llm = ChatAnthropic(model="claude-sonnet-4-6")
def answer_node(state: State) -> dict:
response = llm.invoke(state["question"])
return {"answer": response.content}
workflow = StateGraph(State)
workflow.add_node("answer", answer_node)
workflow.add_edge(START, "answer")
workflow.add_edge("answer", END)
graph = workflow.compile()
result = graph.invoke({"question": "Explain backpressure in 2 sentences."})
print(result["answer"])
This is the LangGraph equivalent of a “hello world” — a single-node graph that calls an LLM. The real power, however, only shows up when you add tools, branching, and cycles. Furthermore, the same graph.invoke interface scales from this trivial example to a 30-node production agent.
Building a Stateful Research Agent
Now let us build something useful: an agent that answers research questions by calling a search tool, optionally refining its query, and stopping when it has enough information. This is the canonical “ReAct” pattern (reason + act), and LangGraph models it cleanly with two nodes and a conditional edge.
from typing import TypedDict, Annotated
from operator import add
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END
@tool
def web_search(query: str) -> str:
"""Search the web and return top results."""
# Replace with Tavily, SerpAPI, or your search backend
return f"Top results for: {query}"
tools = [web_search]
tools_by_name = {t.name: t for t in tools}
llm_with_tools = llm.bind_tools(tools)
class AgentState(TypedDict):
messages: Annotated[list, add]
def agent_node(state: AgentState) -> dict:
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def tool_node(state: AgentState) -> dict:
last_message = state["messages"][-1]
outputs = []
for tool_call in last_message.tool_calls:
result = tools_by_name[tool_call["name"]].invoke(tool_call["args"])
outputs.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
return {"messages": outputs}
def should_continue(state: AgentState) -> str:
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
This sets up two nodes (agent and tools) and a router function. Next, you wire them into a graph with a conditional edge:
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent") # Loop back
graph = workflow.compile()
The cycle is the critical line: workflow.add_edge("tools", "agent") sends control back to the LLM after every tool call. Consequently, the agent can call search, read the result, call search again with a refined query, and only return a final answer when it stops requesting tools. For a deeper look at how tools work in modern LLMs, see Claude tool use patterns.
To invoke this agent:
result = graph.invoke({
"messages": [HumanMessage(content="Compare Postgres and MySQL for analytics workloads.")]
})
for msg in result["messages"]:
print(f"{type(msg).__name__}: {msg.content[:200]}")
Adding Cycles With Conditional Edges
The previous example uses one conditional edge to choose between “call tools” and “stop”. In production, you often want richer branching. For instance, an agent might route to different specialist nodes based on the question type, or it might evaluate its own output and loop back to refine it.
def router(state: AgentState) -> str:
last_message = state["messages"][-1]
if "summarize" in last_message.content.lower():
return "summarizer"
if "calculate" in last_message.content.lower():
return "calculator"
if last_message.tool_calls:
return "tools"
return END
workflow.add_conditional_edges(
"agent",
router,
{
"summarizer": "summarizer_node",
"calculator": "calculator_node",
"tools": "tools",
END: END,
},
)
Conditional edges accept any Python function that returns a string matching one of the destination keys. As a result, you can implement complex routing logic — including ML-based routing or external policy checks — without hacking the graph itself.
To prevent runaway loops, LangGraph supports a recursion_limit config:
graph.invoke({"messages": [...]}, config={"recursion_limit": 25})
When the graph exceeds the limit, it raises GraphRecursionError. This is your safety net against infinite tool-calling loops. For more on building bounded agent loops in general, see our guide to building AI agents with tools, planning, and execution.
Persistence With Checkpointers
The killer feature for production work is the checkpointer. A checkpointer saves the full state after every node, keyed by a thread_id. Crucially, this turns the graph into a resumable workflow.
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string(":memory:")
graph = workflow.compile(checkpointer=checkpointer)
config = {"configurable": {"thread_id": "user-42-conv-1"}}
graph.invoke({"messages": [HumanMessage(content="What is Kafka used for?")]}, config)
graph.invoke({"messages": [HumanMessage(content="And how does it compare to RabbitMQ?")]}, config)
Two things happen here. First, the second invoke automatically loads the prior state from the checkpointer using the thread_id, so the agent already knows about the Kafka conversation. Second, if the second call crashes mid-execution, the next attempt resumes from the last successful node — not from the start.
For production, swap SqliteSaver for PostgresSaver (also bundled) or RedisSaver. Each backend implements the same interface. Therefore, you can develop with SQLite locally and switch to Postgres in production by changing one line.
Streaming and Human-in-the-Loop
LangGraph streams events at three levels: node-by-node updates, message tokens from the LLM, and individual state values. Streaming is a simple call to graph.stream instead of graph.invoke:
for event in graph.stream({"messages": [HumanMessage(content="...")]}, config, stream_mode="updates"):
for node_name, update in event.items():
print(f"[{node_name}] {update}")
The stream_mode options include "updates" (state diffs per node), "values" (full state after each node), and "messages" (LLM tokens as they generate). For building chat UIs, see our guide on streaming AI chatbot responses.
Human-in-the-loop uses the interrupt_before parameter at compile time:
graph = workflow.compile(checkpointer=checkpointer, interrupt_before=["tools"])
With this, execution pauses before every tool call. The thread state is checkpointed automatically. Your application then inspects the pending tool call, either approves it (call graph.invoke(None, config) to resume), modifies the state (call graph.update_state(config, ...)), or cancels the run. This pattern is essential for high-stakes agents — anything that writes to a database, sends email, or moves money should route through an approval node first.
When to Use LangGraph
- You need cycles in your control flow (ReAct-style agents, retry loops, self-critique)
- Your agent must survive crashes and resume from the last checkpoint
- You want first-class human-in-the-loop approval gates
- You need to stream intermediate state to a frontend (not just final tokens)
- You are already using LangChain components and want them to compose cleanly
- Your workflow has 5+ distinct steps with conditional branching
When NOT to Use LangGraph
- Your “agent” is a single LLM call followed by a single tool call (just use the LLM SDK directly)
- You need strict TypeScript typing (consider Mastra or Vercel AI SDK instead)
- The team has zero LangChain experience and the task is a one-off script (the abstractions are overkill)
- You need millisecond-latency routing (the graph compile and dispatch overhead is small but real)
- Your workflow is purely linear with no branching — a regular function with sequential calls is clearer
Common Mistakes with LangGraph
Several mistakes show up repeatedly in LangGraph code, even from experienced developers.
Forgetting the reducer on list state. Without Annotated[list, add], each node overwrites the messages list instead of appending. The agent then loses all prior context after the first tool call, and you spend an hour wondering why the LLM keeps repeating itself.
Returning the full state from a node. Nodes should return partial updates — only the fields they changed. Returning the entire state works but wastes memory and obscures what each node actually does. Likewise, never mutate the input state object directly; always return a fresh dict.
Skipping the recursion limit. A misbehaving tool that always returns “incomplete” can trap an agent in an infinite loop, consuming tokens at full LLM pricing. Set recursion_limit explicitly in every production deployment, and monitor for GraphRecursionError events.
Conflating state with conversation history. State is for everything the graph needs — pending tool calls, retry counters, intermediate retrieval results, user preferences. Stuffing all of this into the messages field makes prompts huge and confuses the model. Add dedicated fields for non-message state.
Using a shared thread_id across users. The checkpointer keys on thread_id. If two users hit the same thread, they will see each other’s conversation history. Generate per-user, per-conversation thread IDs and treat them as you would session tokens.
Wrapping every tool in a try/except inside the node. LangGraph already isolates node failures and lets you retry via the checkpointer. Adding manual exception handling inside tool nodes often swallows errors that should surface as failed runs.
Real-World Scenario: Multi-Step Document Processing
A mid-sized SaaS team building an invoice-processing pipeline commonly hits the limits of linear chains. The naive design — extract text, run an LLM, write to a database — breaks down when invoices have edge cases that need clarification, when the LLM hallucinates a vendor name that does not exist in the customer table, or when an extraction call times out halfway through a batch.
A LangGraph implementation handles this naturally. One node extracts text from the PDF, another calls the LLM to parse fields, a third validates the parsed vendor against the customer database, and a conditional edge routes invalid extractions back to a clarification node that re-prompts with the database matches. The checkpointer ensures that a Postgres outage mid-batch does not lose work — the pipeline resumes from the last successful invoice when the database returns.
Furthermore, the human-in-the-loop interrupt lets operations staff approve any invoice above a configurable dollar threshold before it writes to the accounting system. Streaming updates flow to a dashboard that shows which step each invoice is on, so the team can spot bottlenecks in real time. This same pattern applies to any multi-stage LLM workflow where individual steps can fail independently — moderation pipelines, research agents, and code generation flows all benefit from the same structure.
For deeper context on building production-grade RAG pipelines that integrate with this kind of agent, see our guides on agentic RAG architectures and LlamaIndex vs LangChain for RAG.
Conclusion
LangGraph fills a real gap in the Python AI stack: it gives you state, cycles, persistence, and streaming without forcing you to build that scaffolding yourself. For simple LLM calls, it is overkill. However, for any agent that loops, retries, branches, or needs human approval, it removes more code than it adds. Start with the two-node ReAct pattern in this tutorial, then add a checkpointer the moment your graph survives long enough to need persistence. From there, layer in conditional edges, interrupts, and streaming as your use case demands. Next, explore building AI agents with tools, planning, and execution for patterns that apply across LangGraph and other agent frameworks.