
If you’re building retrieval-augmented generation in Python, the LlamaIndex vs LangChain question shows up early. Both frameworks ingest documents, embed them, store them, and stitch retrieval into LLM calls. However, they were built with different design priorities, and that difference shows up in production. This guide compares LlamaIndex and LangChain for RAG specifically, with code from both frameworks, a decision matrix, and the trade-offs nobody mentions in the quickstart docs.
This post is written for intermediate Python engineers picking a framework for a real RAG application — internal search, support chatbots, document Q&A, or knowledge-base assistants. Furthermore, it assumes you already understand the basics of RAG architecture from scratch. By the end, you’ll know which framework fits your use case and why.
What Is LlamaIndex?
LlamaIndex is a Python framework built specifically for connecting LLMs to external data. Originally released as GPT Index in late 2022, it focuses on the indexing, retrieval, and query layers of RAG. In practice, LlamaIndex treats your data as the first-class citizen and the LLM as a query engine over that data.
The framework ships with structured abstractions: Document, Node, Index, Retriever, QueryEngine, and ResponseSynthesizer. Each abstraction maps to one stage of the RAG pipeline. As a result, the mental model stays consistent whether you’re indexing 100 documents or millions.
LlamaIndex also offers higher-level constructs for advanced retrieval patterns: hierarchical indices, recursive retrieval, query routers, and sub-question decomposition. These exist as first-party features rather than community recipes. Consequently, if your RAG pipeline needs more than top-k vector search, LlamaIndex gives you sharper tools out of the box.
What Is LangChain?
LangChain is a general-purpose framework for LLM applications. Launched in October 2022, it grew into a broad toolkit covering chains, agents, tools, memory, retrievers, output parsers, and integrations with hundreds of providers. RAG is one capability inside LangChain, not its entire purpose.
The framework’s mental model centers on composition. You compose Runnable objects with the LangChain Expression Language (LCEL), wire retrievers into chains, and add memory or tool-calling on top. For background on the core abstractions, see our LangChain fundamentals guide.
Since LangChain covers so much surface area, the API has churned more than LlamaIndex. The 0.1 → 0.2 → 0.3 transitions split packages (langchain-core, langchain-community, langchain-openai) and deprecated older patterns. In contrast, this breadth is also LangChain’s biggest strength: if you need agents, tool calling, structured output, and RAG in one app, LangChain probably already has a primitive for it.
LlamaIndex vs LangChain: Key Differences
The two frameworks overlap on the basics but diverge on philosophy and depth. Here’s a side-by-side view of the dimensions that actually matter for RAG.
| Dimension | LlamaIndex | LangChain |
|---|---|---|
| Primary focus | RAG, indexing, retrieval | General LLM apps (RAG, agents, tools, memory) |
| Core abstraction | Index → Retriever → QueryEngine | Runnable chain (LCEL) |
| Advanced retrieval | Built-in (recursive, router, sub-question) | Community or custom code |
| Agent support | Solid but secondary | First-class with extensive tool ecosystem |
| Document loaders | 300+ via LlamaHub | 100+ via langchain-community |
| Vector store integrations | 40+ | 80+ |
| Streaming responses | Supported, simpler API | Supported, more flexible via LCEL |
| Observability | LlamaTrace, OpenLLMetry, Arize | LangSmith (first-party, paid) |
| API stability | Steadier, fewer breaking changes | Higher churn across minor versions |
| Learning curve | Lower for pure RAG | Steeper, but generalizes further |
Importantly, neither framework locks you in completely. LlamaIndex retrievers can be wrapped as LangChain retrievers, and vice versa. Therefore, the choice is usually about which framework owns the primary control flow of your app, not which one you’re allowed to use.
Indexing and Retrieval Depth
LlamaIndex separates ingestion into explicit phases: SimpleDirectoryReader loads files, SentenceSplitter (or similar) creates Node objects, and VectorStoreIndex.from_documents() handles embedding and storage. Moreover, each stage has clear hooks. You can swap chunking strategies — see our breakdown of RAG chunking strategies — without touching the rest of the pipeline.
LangChain handles the same stages through DocumentLoader, TextSplitter, Embeddings, and VectorStore classes. The pieces compose cleanly, but advanced patterns like recursive retrieval over hierarchical summaries take more glue code. For instance, building a parent-document retriever in LangChain requires explicit setup of ParentDocumentRetriever plus a docstore. In LlamaIndex, the same pattern is one call to RecursiveRetriever over an IndexNode graph.
Agents and Tool Use
LangChain’s agent story is more mature. Its create_openai_tools_agent, AgentExecutor, and LangGraph integration give you tool calling, planning loops, and stateful workflows out of the box. For agentic apps where RAG is one tool among many, LangChain is the more natural fit. If you’re new to agents generally, our overview of building AI agents with tools, planning, and execution covers the core patterns.
LlamaIndex has agents too — FunctionCallingAgent, ReActAgent, and the newer Workflow API — but they feel like an extension of the query engine rather than a separate runtime. For pure retrieval, that’s fine. However, for complex multi-tool agents with branching logic, LangGraph (LangChain’s stateful agent framework) has more momentum and a larger community.
Code Comparison: The Same RAG Pipeline in Both
To make the differences concrete, here’s the same minimal RAG pipeline — load a directory of markdown files, embed them, and answer a query — written in both frameworks.
LlamaIndex Version
import os
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext,
Settings,
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
# Configure global defaults — LLM and embedding model
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Set up a persistent Chroma collection so re-runs don't re-embed
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
show_progress=True,
)
# Query
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What is our refund policy for enterprise plans?")
print(response)
print("\nSources:")
for node in response.source_nodes:
print(f" - {node.metadata.get('file_name')} (score: {node.score:.3f})")
The structure mirrors the conceptual pipeline almost line for line. Specifically, VectorStoreIndex.from_documents handles chunking, embedding, and writing to Chroma in a single call. Source citations come back attached to the response object without extra wiring.
LangChain Version
import os
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# Load and split documents
loader = DirectoryLoader(
"./data",
glob="**/*.md",
loader_cls=TextLoader,
)
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100,
)
chunks = splitter.split_documents(documents)
# Embed and store in Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db",
)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})
# Build the RAG chain with LCEL
prompt = ChatPromptTemplate.from_template(
"""Answer using only the context below. If the answer is not present, say so.
Context:
{context}
Question: {question}"""
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def format_docs(docs):
return "\n\n".join(d.page_content for d in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
answer = rag_chain.invoke("What is our refund policy for enterprise plans?")
print(answer)
The LangChain version is more explicit. You see the prompt template, the chunking parameters, and the chain composition directly. Conversely, you also write more code. For source citations, you’d add a second chain that returns documents alongside the answer.
The takeaway: LlamaIndex hides the RAG plumbing and exposes domain concepts; LangChain exposes the plumbing and lets you assemble it. Neither approach is wrong — they fit different teams and different problems.
Choosing Between LlamaIndex and LangChain
Both frameworks can ship the same RAG app. The decision is really about what else you’ll build on top, and how much retrieval depth you need.
Pick LlamaIndex If
- Your application is primarily document Q&A, semantic search, or knowledge-base retrieval
- Advanced retrieval patterns (recursive retrieval, query routing, sub-question decomposition, hierarchical indices) need to work without custom plumbing
- A stable, RAG-focused API surface matters more to your team than a sprawling general toolkit
- LlamaParse handles your complex PDFs and structured documents better than most third-party loaders
- Structured data sources (SQL, Pandas, graph databases) plug in through LlamaIndex’s query engines
- Source citation and provenance tracking are critical, and you don’t want to wire them yourself
Pick LangChain If
- Your app is an agent that uses retrieval as one tool among several (search, code execution, API calls, function calling)
- The integration ecosystem matters — Slack, Notion, Discord, and a broader vector-store lineup all ship as first-party connectors
- Stateful multi-step workflows fit LangGraph’s graph-based control flow more naturally than a query engine
- LangSmith already powers your tracing and evaluation, and you want one vendor for both layers
- Fine-grained control over every prompt and every chain step is non-negotiable
- Your RAG path is straightforward (top-k retrieval over a flat index) and the bulk of complexity lives elsewhere
Combine the Two When
- Indexing and retrieval live in LlamaIndex while agent orchestration stays in LangChain (
LlamaIndexRetrieverbridges the two) - LlamaIndex’s advanced retrieval shows up as a single tool inside a LangChain or LangGraph agent
- Migration scenarios where one framework already owns part of the codebase and rewriting isn’t worth the cost
Common Mistakes With Both Frameworks
Both ecosystems make it easy to ship a demo and hard to ship production. Watch for these patterns regardless of which framework you pick.
- Treating the default chunker as production-ready. The defaults (1000 characters, 200 overlap, naive whitespace splitting) work for prose but mangle code, tables, and structured documents. Always test retrieval quality on your real corpus before deploying.
- Skipping reranking. Top-k vector search alone leaves a lot of accuracy on the table. Adding a reranker (Cohere Rerank, BGE-reranker, or a cross-encoder) typically lifts answer quality noticeably. See our deep dive on hybrid search combining BM25 and vector retrieval for the patterns that compound here.
- Pinning to a major version without reading release notes. Both frameworks have shipped breaking changes inside minor releases. Lock your dependency versions and update deliberately, especially for LangChain.
- Ignoring evaluation. Spinning up an index and asking it questions is not evaluation. Build a small labeled set of queries and expected answers, then measure retrieval recall and answer faithfulness before and after every change.
- Re-embedding on every restart. Persist your vector store. Embedding is the slow, expensive step; you should pay for it once per document version, not once per container start.
- Choosing the framework before choosing the vector store. The vector database is usually the bigger architectural decision. Our comparison of vector databases for production RAG helps narrow that choice first.
A Realistic Migration Scenario
Consider a mid-sized SaaS company building an internal knowledge-base assistant over 5,000 product docs, runbooks, and onboarding pages. The first version typically ships in LangChain because the team already uses it for an agent that handles ticket triage. After a few weeks of beta, two patterns emerge: answer quality is uneven on long runbooks, and users want source links with confidence scores.
In that situation, swapping the retrieval layer to LlamaIndex (while keeping the LangChain agent shell) usually pays off. LlamaIndex’s SentenceWindowNodeParser keeps surrounding context attached to each chunk, which helps the LLM ground answers in the right paragraph. Furthermore, MetadataReplacementPostProcessor returns the wider window only at synthesis time, so retrieval precision stays high without losing context. The bridge is a thin wrapper that exposes the LlamaIndex retriever as a LangChain BaseRetriever, so the agent code barely changes.
This is the most common production pattern: LangChain (or LangGraph) owns orchestration, LlamaIndex owns indexing and retrieval. Neither team has to commit fully, and the rewrite stays contained to the retrieval boundary.
If you’re starting greenfield and the app is only RAG, skip the bridge — just pick LlamaIndex and avoid the extra dependency. Production examples that pair LlamaIndex with a managed vector store work well; see our walkthrough on Pinecone Serverless for production RAG for one end-to-end setup.
Final Recommendation
For pure RAG — document Q&A, semantic search, knowledge bases — LlamaIndex is the better default in 2026. It gives you advanced retrieval patterns for free, stable APIs, and a mental model that matches the problem. For agent-heavy apps where retrieval is one tool of many, LangChain (especially with LangGraph) is the stronger choice. When in doubt, prototype the riskiest part of your app in both frameworks for an afternoon. The right answer becomes obvious quickly.
The LlamaIndex vs LangChain debate isn’t really a winner-takes-all question. Both frameworks are well-maintained, both have large communities, and both can ship production RAG. Next, decide which framework owns your control flow, pick the vector store that fits your scale, and move on to the work that actually moves quality: chunking, reranking, and evaluation.