LangChain vs LlamaIndex vs Haystack 2026: Which to Use

raglangchainllamaindexhaystackpythonopensource

TL;DR: LangGraph (LangChain’s agent layer) handles multi-step agents with tool calls and persistent memory better than the alternatives. LlamaIndex ships a working RAG pipeline faster with lower overhead. Haystack wins when you need every pipeline step to be serializable, testable, and auditable. All three work with Ollama for fully local deployments.

LangGraph / LangChainLlamaIndexHaystack
Best forMulti-step agents, tool calls, persistent memoryRAG-first projects, multi-modal retrievalProduction teams, auditable pipelines
Current versionLangChain 1.3.2 / LangGraph 1.2.20.14.22 (May 2026)2.29.0 (May 2026)
LicenseMITMITApache 2.0
Framework overhead~10 ms (LangChain) / ~14 ms (LangGraph)~6 ms~5.9 ms
The catchHigher latency, largest abstraction surfaceAgent support is secondarySmaller community, verbose wiring

Honest take: Start with LlamaIndex for a new RAG project — you’ll have something working in 30 minutes. Move to LangGraph if you outgrow it, or use them together.


What changed in 2026

Three years ago, choosing between these frameworks was mostly a style preference. Today they’ve diverged into genuinely different tools making different primary bets.

LangChain hit its v1.0 stable milestone in October 2025 with LangGraph positioned as the primary interface for any non-trivial workflow. LangChain 1.3.2 + LangGraph 1.2.2 is the current production combo. LangGraph adds checkpointing (agents survive server restarts), fine-grained node execution control with per-node timeouts and error recovery, and a content-block-centric streaming API. The ecosystem has over 100K GitHub stars and the largest community of the three by a significant margin. The flip side: LangGraph adds another abstraction layer on top of an already-layered stack. If you’re building a semantic search API without agents, that complexity is unnecessary.

LlamaIndex is at v0.14.22 (released May 14, 2026) and has repositioned itself as an “agentic document and OCR platform” rather than just a query library. The core remains the best-in-class retrieval pipeline — VectorStoreIndex, HybridRetriever, and SubQuestionQueryEngine are production-grade with minimal setup. Multi-modal retrieval (text and images in the same query pipeline) works properly now; it was research-grade in v0.10. The framework has ~40K GitHub stars and a mature integration ecosystem. The Ollama packages (llama-index-llms-ollama, llama-index-embeddings-ollama) have been stable for over a year.

Haystack is at v2.29.0 (released May 12, 2026) and is a complete architectural rewrite from v1. Every pipeline is a typed directed acyclic graph (DAG) where each component declares its inputs and outputs explicitly, pipelines serialize to YAML for version control and deployment, and the observability stack (OpenTelemetry, Langfuse, MLflow) is first-class rather than bolted on. v2.29.0 builds on the State injection feature from v2.28.0, which lets components access and modify live agent state at invocation time. Haystack has ~15K GitHub stars — smaller community means harder debugging when you hit edge cases.


Minimal RAG pipeline in each

Same task for all three: index a folder of PDFs, run a semantic query, return an answer.

LlamaIndex — fewest lines to a working pipeline

pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm = Ollama(model="llama3.2", request_timeout=60.0)
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the key findings?")
print(response)

SimpleDirectoryReader handles PDFs, DOCX, TXT, and HTML out of the box. VectorStoreIndex defaults to in-memory storage; swapping in Chroma, Qdrant, or Weaviate requires one line. The global Settings object propagates your LLM and embedding choices to every component downstream — you set it once.

For a deeper look at how chunking and retrieval choices affect output quality, see RAG Architecture Deep Dive 2026.

LangChain — more explicit, more imports

pip install langchain langchain-ollama langchain-community chromadb pypdf
from langchain_ollama import OllamaLLM, OllamaEmbeddings
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

llm = OllamaLLM(model="llama3.2")
embeddings = OllamaEmbeddings(model="nomic-embed-text")

loader = PyPDFDirectoryLoader("./docs")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)

vectorstore = Chroma.from_documents(chunks, embeddings)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever()
)

result = qa_chain.invoke({"query": "What are the key findings?"})
print(result["result"])

More lines, more imports, same result. The explicit text-splitter step is a real difference — LangChain doesn’t auto-chunk on ingest. That’s explicit control, not a bug, but it’s friction you don’t hit with LlamaIndex until you need to tune chunk sizes for production.

Haystack — maximum explicitness

pip install haystack-ai ollama-haystack
from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.ollama import (
    OllamaDocumentEmbedder, OllamaTextEmbedder
)
from haystack_integrations.components.generators.ollama import OllamaGenerator

document_store = InMemoryDocumentStore()

# Step 1: indexing pipeline
indexing = Pipeline()
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=10))
indexing.add_component("embedder", OllamaDocumentEmbedder(model="nomic-embed-text"))
indexing.add_component("writer", DocumentWriter(document_store=document_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["./docs/report.pdf"]}})

# Step 2: query pipeline
TEMPLATE = """
Answer based on the documents below.
Documents: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ question }}
"""
querying = Pipeline()
querying.add_component("embedder", OllamaTextEmbedder(model="nomic-embed-text"))
querying.add_component("retriever", InMemoryEmbeddingRetriever(document_store))
querying.add_component("prompt_builder", PromptBuilder(template=TEMPLATE))
querying.add_component("generator", OllamaGenerator(model="llama3.2"))
querying.connect("embedder.embedding", "retriever.query_embedding")
querying.connect("retriever.documents", "prompt_builder.documents")
querying.connect("prompt_builder", "generator.prompt")

result = querying.run({
    "embedder": {"text": "What are the key findings?"},
    "prompt_builder": {"question": "What are the key findings?"}
})
print(result["generator"]["replies"][0])

Verbose? Yes. But every connection is visible and traceable. If the pipeline fails, you know which component and which output failed. The pipeline serializes to YAML with indexing.to_dict(), so you can store it in version control, deploy it to a different environment, and diff changes over time. That’s the capability teams running LLM pipelines in regulated industries want.


Agent and workflow support

This is where the three frameworks diverge most sharply.

LangGraph is purpose-built for agents. Nodes are Python functions; edges are transitions between them; channels carry shared state. A LangGraph agent that calls tools, waits for a human approval step, and resumes after a server restart takes roughly 30 lines of Python to wire up correctly. The v1.2.2 checkpoint system persists agent state to disk or a database between invocations — your agent doesn’t lose context when the process dies. This is the single feature that put LangGraph ahead of competitors for production multi-step agents.

LlamaIndex Workflows (introduced in v0.11, significantly matured by v0.14) support step-based agents with typed events between steps and explicit state. They work well for document-processing pipelines that occasionally need LLM reasoning — parse, classify, extract, summarize — where each step can call a model or a tool. For a single-tool retrieval agent, LlamaIndex Workflows handles it cleanly. For a 5-tool, memory-enabled, conditional-logic agent, LangGraph is the more appropriate tool.

Haystack Agents matured considerably with v2.28.0’s State injection. Before that release, building anything agent-shaped in Haystack required manual pipeline chaining. The State API now lets tool components read from and write to a shared state object that persists between tool calls. It closes the most painful gap, but LangGraph’s agent primitives are still more flexible for complex workflows.


Local LLM support (Ollama)

All three integrate with Ollama. The developer experience differs.

LlamaIndex has the cleanest integration. Set Settings.llm and Settings.embed_model once at the top of your script; every component respects those settings automatically. Streaming, JSON mode (structured output via Ollama’s response format), and async support all work correctly.

LangChain uses the langchain-ollama package, which wraps Ollama’s OpenAI-compatible endpoint. It works, but langchain-ollama has its own release cycle and occasionally lags a week or two behind new Ollama versions. The workaround — pointing ChatOpenAI at http://localhost:11434/v1 — bypasses the package entirely and is often more reliable when you hit version mismatches:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="llama3.2",
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # placeholder, Ollama ignores it
)

Haystack uses the ollama-haystack integration package (released March 9, 2026), which provides OllamaGenerator, OllamaChatGenerator, OllamaDocumentEmbedder, and OllamaTextEmbedder. One difference from LlamaIndex: you must explicitly attach the embedder to each pipeline component — there’s no global settings object that propagates automatically. Minor friction, but it means one more thing to remember when wiring a pipeline.


Performance

Independent benchmark (axiomlogica.com, 2026) running identical components — same model, same embeddings (BGE-small), same retriever (Qdrant), same tools — isolating only framework orchestration overhead:

FrameworkOverhead per queryToken usage per query
Haystack 2.x~5.9 ms~1,570 tokens
LlamaIndex 0.14~6.0 ms~1,600 tokens
LangChain 1.x~10.0 ms~2,400 tokens
LangGraph 1.x~14.0 ms~2,030 tokens

LangChain’s higher token count isn’t arbitrary — it’s the default system prompts and chain-of-thought scaffolding the framework adds. You can reduce it significantly with custom prompts, but you have to know to do that.

Separate end-to-end comparison: LangChain shows ~220 ms median latency versus LlamaIndex’s ~185 ms for equivalent RAG pipelines. LangChain’s LCEL query rewriting adds 15–20% overhead versus LlamaIndex’s direct retrieval path.

At 50 queries/minute the difference is imperceptible. At 50,000 queries/minute it starts to matter. Most self-hosted RAG deployments sit closer to the former.


When NOT to use each

Skip LangChain/LangGraph if:

  • Your application is straightforward semantic search with no agents or tool calls — the abstraction cost isn’t justified
  • You need the lowest possible latency per query
  • You want to minimize the number of packages your app depends on

Skip LlamaIndex if:

  • Agents with multi-step reasoning and persistent memory are the core feature, not a nice-to-have
  • You need fully serializable pipeline configs for CI/CD deployment and diff-based review
  • Your team is TypeScript-first (the TS SDK exists but lags the Python version by several months)

Skip Haystack if:

  • You’re prototyping and requirements change daily — explicit DAG wiring is friction when you’re still discovering the shape of the problem
  • You need a large community and extensive StackOverflow coverage for fast debugging
  • Your agent requirements involve complex branching, conditional loops, or memory across sessions (as of v2.29.0, that’s still LangGraph’s territory)

Quick decision guide

SituationUse this
New RAG project, fastest time-to-working-demoLlamaIndex
Multi-step agent with tool calls and memoryLangGraph
Already on LangChain, want to add agentsLangGraph (extend, don’t migrate)
Production RAG with auditable pipeline configsHaystack
Multi-modal retrieval (text + images in same index)LlamaIndex
Regulated industry, OTEL observability requiredHaystack
RAG over structured data (SQL, tables)LlamaIndex (NLSQLTableQueryEngine)
Mixed RAG + agentic workflowLlamaIndex (retrieval) + LangGraph (orchestration)

The last row is underused. The frameworks aren’t mutually exclusive — LlamaIndex exposes its query engine as a LangChain tool, which means you can use LlamaIndex’s retrieval primitives inside a LangGraph agent. This combination covers most production use cases without committing fully to either ecosystem.

For context on how these frameworks fit into a complete local AI deployment, see The Open-Source AI Stack in 2026.


Frequently Asked Questions

Can I switch frameworks after I’ve already built my RAG pipeline? Yes, but it’s painful. LlamaIndex and LangChain each have distinct document, node, and index abstractions that don’t map directly to each other. If you’re past 10,000 lines of framework-specific code, the migration cost typically outweighs the gains unless you’re hitting a hard architectural ceiling. Haystack’s YAML serialization makes it easier to port pipeline logic since the component graph is explicit, but you’re still rewriting integration code.

Which framework handles a 7B local model on CPU best? LlamaIndex degrades most gracefully. Its separation between the retrieval step (deterministic) and the generation step (LLM-dependent) means a weaker model only hurts output quality, not pipeline stability. LangChain’s agentic flows tend to break harder when a model can’t reliably follow tool-call syntax. Haystack’s explicit pipeline structure at least tells you exactly which component produced a bad output.

Do all three support streaming responses with Ollama? Yes. LlamaIndex and LangChain have had solid streaming support for over a year. Haystack added proper streaming in v2.20 and it works with the Ollama integration. Enable it with stream=True on the generator component in Haystack; LlamaIndex uses query_engine.stream_chat().

Which is best for RAG over structured data like SQL or CSV tables? LlamaIndex leads clearly here. Its NLSQLTableQueryEngine and PandasQueryEngine are production-grade. LangChain has SQL chain support, but it’s less actively maintained and requires more manual configuration. Haystack focuses on unstructured document retrieval and has limited native structured-query support as of v2.29.0.

Is Haystack only worth learning for enterprise teams? No. Pipeline serialization to YAML is useful even at small team size if you’re running any CI/CD testing on pipeline configs. The learning curve is steeper than LlamaIndex for the first pipeline, but Haystack’s explicitness pays off once you need to modify, version, or debug a pipeline that’s been running in production for several months.


Sources

Was this article helpful?