Chroma vs Qdrant vs Weaviate 2026: RAG Database Compared

vectordbairagpythonopensource

The three most commonly recommended open-source vector databases for RAG — Chroma, Qdrant, and Weaviate — are not interchangeable. Chroma is a prototyping tool that grew into a real product. Qdrant is a production workhorse written in Rust with the best filtering performance of the three. Weaviate is an enterprise-grade platform with hybrid search and the most built-in integrations. Using Weaviate when you need Chroma adds unnecessary ops overhead. Using Chroma when you need Qdrant means migrating under pressure when your collection outgrows it.

Versions covered: ChromaDB v1.5.9 (May 2026), Qdrant v1.17.1 (March 2026), Weaviate v1.37 (May 2026).


The quick answer

SituationBest choice
Local prototyping, notebooks, under 100K vectorsChroma
Embedded in a Python process — no separate serviceChroma
Production RAG with filtering-heavy queriesQdrant
Multi-user deployment, concurrent queriesQdrant
Memory-constrained deployment at millions of vectorsQdrant
Hybrid search (BM25 + vector in one query)Weaviate
Multi-modal retrieval (text + images + audio)Weaviate
Built-in re-ranking or generative AI modulesWeaviate
Kubernetes, team-operated, agentic MCP workflowsWeaviate
Getting from zero to working RAG in 10 minutesChroma

What each tool actually is

ChromaDB (Apache 2.0, chroma-core/chroma) started as a pure-Python embedded database and was rebuilt in Rust for the v1.0 release. The Rust core eliminates Python’s GIL bottlenecks and delivers roughly 4× faster writes and queries compared to the pre-1.0 implementation — write throughput went from ~10K to ~40K+ vectors/second in server mode. Chroma’s design priority is developer ergonomics: pip install chromadb, three lines of Python, and you have a working local vector store. The default mode runs in-process — no Docker, no service to start, no YAML. You can run it in server mode for multi-client access when you’re ready.

Qdrant (Apache 2.0, qdrant/qdrant) is written entirely in Rust and optimized for production-grade vector similarity search. It runs as a standalone service via Docker with REST and gRPC APIs. Qdrant’s main differentiators are its payload filtering system — which combines vector similarity with structured metadata filters inside the HNSW traversal rather than as a post-filter — and its quantization stack (Scalar, Binary, Product, TurboQuant), which lets you compress large collections by up to 32× to stay within affordable RAM budgets. The Qdrant team publishes transparent benchmarks and consistently posts among the lowest latency at the highest recall in the ANN-benchmarks suite.

Weaviate (BSD-3-Clause, weaviate/weaviate) is the most feature-complete of the three. Written in Go, it combines HNSW vector search with BM25 keyword search in a single unified query — what Weaviate calls hybrid search. Pure vector similarity fails on exact string matches like model names, product codes, and proper nouns; BM25 fills those gaps. Weaviate v1.37 added a built-in MCP (Model Context Protocol) server, meaning Claude Code, Cursor, and any MCP-compatible agent can read and write to your database natively without glue code. It also added Diversity Search (MMR) and query profiling with per-shard timing breakdowns.


Versions, licensing, and architecture

ChromaDBQdrantWeaviate
Current versionv1.5.9 (May 2026)v1.17.1 (Mar 2026)v1.37 (May 2026)
LicenseApache 2.0Apache 2.0BSD-3-Clause
Core languageRust (Python + JS clients)RustGo
API surfaceREST, Python, JSREST + gRPCREST + GraphQL + gRPC
Self-hostableYesYesYes
Managed cloudChroma CloudQdrant CloudWeaviate Cloud

Hardware requirements

All three are CPU-capable for small workloads. RAM is the real constraint — vector indexes need to live in memory for fast queries.

ChromaDB v1.5.9Qdrant v1.17.1Weaviate v1.37
Development minimum2 GB RAM2 GB RAM8 GB RAM (recommended)
Production minimum8 GB RAM4 GB RAM16 GB RAM
GPU requiredNoNoNo
Python requiredYes (client)No (Docker binary)No (Docker)
OS supportLinux, macOS, WindowsLinux, macOS, WindowsLinux, macOS, Windows

Qdrant’s numbers are worth digging into. To serve 1 million vectors at 1536 dimensions (OpenAI text-embedding-3-large) in float32, you need roughly 1.2 GB RAM. Enable Scalar quantization and that drops to ~300 MB for the same dataset, per the Qdrant memory consumption documentation. This is the reason Qdrant wins for memory-constrained production setups.

Weaviate’s higher baseline memory usage comes from its module system. Each built-in vectorizer, re-ranker, or generative model you enable loads additional components into the container. For raw vector storage with external embeddings, Weaviate is comparable to Qdrant; the gap appears when you start enabling modules.

ChromaDB’s storage overhead runs 2–4× your data size on disk (data + HNSW index + WAL). For development, the in-process client needs only enough RAM to load the collection.


Installation

Chroma — in-process or server

pip install chromadb
import chromadb

# Ephemeral (in-memory, lost on restart)
client = chromadb.Client()

# Persistent (saved to disk, no separate process)
client = chromadb.PersistentClient(path="./chroma_data")

collection = client.get_or_create_collection("my_docs")
collection.add(
    documents=["doc one", "doc two"],
    ids=["id1", "id2"]
)
results = collection.query(query_texts=["example query"], n_results=2)

For multi-client server mode:

chroma run --path ./chroma_data --port 8000

That’s the complete setup. No Docker required for development.

Qdrant — Docker-first

docker run -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

Or with Docker Compose (recommended for persistence across restarts):

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"   # REST API
      - "6334:6334"   # gRPC
    volumes:
      - ./qdrant_storage:/qdrant/storage
    restart: unless-stopped
pip install qdrant-client
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient("localhost", port=6333)
client.create_collection(
    collection_name="my_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

Weaviate — Docker Compose with config

services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
      - "50051:50051"
    volumes:
      - weaviate_data:/var/lib/weaviate
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      DEFAULT_VECTORIZER_MODULE: "none"
      ENABLE_API_BASED_MODULES: "true"
      CLUSTER_HOSTNAME: "node1"

volumes:
  weaviate_data:
docker compose up -d
pip install weaviate-client
import weaviate

client = weaviate.connect_to_local()

Weaviate is available at localhost:8080. If you enable built-in vectorizers (text2vec-openai, text2vec-cohere, etc.), the Compose file gets more involved — see docs.weaviate.io for the full module setup.


Feature comparison

Filtering

Filtering is the most practically important difference in this comparison. Real RAG queries almost always combine vector similarity with metadata constraints — find the top-5 chunks similar to my query that are from documents in the legal category uploaded after 2025-01-01. How each database handles this has a direct impact on recall.

ChromaQdrantWeaviate
Metadata filteringYes, where clausesYes, rich payload filteringYes, structured filters
Filter integrationPost-filter (can drop recall)Pre-filter via HNSWPre-filter via ACORN (v1.34+)
Filter on nested/array fieldsLimitedYesYes
Geo filteringNoYesYes
Full-text filterLimitedSeparate keyword indexYes (native BM25)

Qdrant’s filtering is its standout feature. You can filter on any payload field — integers, keywords, geo-coordinates, datetimes, boolean flags — and Qdrant integrates the filter into the HNSW traversal rather than running vector search and discarding results afterward. High-recall filtered search without the post-filter penalty.

Weaviate v1.34+ defaulted to its ACORN filter strategy as well, maintaining recall under tight filters. The difference is that Weaviate’s hybrid mode lets you run BM25 and vector in one query, weighted by an alpha parameter (0 = pure BM25, 1 = pure vector, 0.5 = equal weight).

Chroma filters are post-filters: it fetches a larger candidate set then narrows by metadata. Fine for small collections; recall degrades on large collections with highly selective filters.

ChromaQdrantWeaviate
BM25 keyword searchNoVia sparse vectors (SPLADE)Native, built-in
Dense + sparse hybridNoYes (manual sparse vector gen)Yes (alpha-weighted fusion)
Re-rankingNoNo (external)Yes (built-in re-ranker modules)

For production RAG, hybrid search consistently beats pure vector retrieval — especially for queries containing exact identifiers, names, or technical terms. Weaviate is the straightforward choice here: hybrid is on by default, no separate sparse embedding pipeline needed. Qdrant supports it through SPLADE-generated sparse vectors, which works well but adds pipeline complexity. Chroma has no native BM25.

Quantization and memory management

ChromaQdrantWeaviate
Scalar quantizationNoYes (float32 → uint8, 4× compression)No
Binary quantizationNoYes (up to 40× compression)No
Product quantizationNoYesNo
TurboQuantNoYes (v1.15+, up to 32× compression)No
Memory-mapped disk storageYes (Apache Arrow)Yes (mmap)Yes

Qdrant’s quantization options are the most mature. For datasets above 5 million vectors, Binary quantization plus on-disk HNSW lets you run very large collections on hardware that would OOM under Chroma or Weaviate’s default float32 storage. This makes Qdrant the best fit for cost-constrained production deployments at scale.


Performance at scale

Benchmarking vector databases means nothing without holding recall constant — comparing latency at different recall@10 levels is meaningless. With that caveat in place:

In community benchmarks on 1–10M vector collections at 768–1536 dimensions, Qdrant consistently achieves sub-10ms median query latency at recall@10 > 0.95 with Scalar quantization enabled on standard hardware (4 vCPU, 8 GB RAM). Qdrant’s Rust runtime and custom HNSW implementation keep it near the top of the ANN-benchmarks leaderboard.

Weaviate performs close to Qdrant on pure vector queries. The overhead appears with hybrid (BM25 + vector) queries, which require maintaining additional inverted index structures. For teams that need hybrid, the latency tradeoff is worth it; for pure vector workloads, Qdrant has the edge.

Chroma after the 1.0 Rust rewrite posts competitive write throughput (~40K+ vectors/second in server mode). Query performance at scale is solid for most RAG pipelines. At collections above 10M vectors with concurrent load, Qdrant and Weaviate handle it more gracefully — Chroma’s scaling story is still maturing compared to the other two.

For most local RAG deployments — document collections under 1M vectors at typical query rates — all three are fast enough that the choice should come down to features and ops complexity, not milliseconds.


When NOT to use each

Don’t use Chroma when:

  • Your collection is heading toward 2M+ vectors with complex filters. Chroma scales, but query performance degrades faster than Qdrant at high cardinality with selective filters.
  • You need hybrid BM25 + vector search. There is no workaround — use Weaviate.
  • You’re running multi-tenant RAG with hundreds of isolated namespaces under production load. Chroma’s multi-tenancy is newer and less battle-tested than Weaviate’s.
  • You need production metrics, distributed replication, or Kubernetes-native deployment. Chroma’s ops story is lighter by design.

Don’t use Qdrant when:

  • You need hybrid BM25 + vector without building your own sparse vector pipeline. You can do it with SPLADE, but it’s extra work.
  • You need built-in re-ranking, multi-modal retrieval, or generative AI integration out of the box. Weaviate’s module system makes these friction-free.
  • You want MCP-native database access for agentic workflows. Weaviate v1.37 has this built in; Qdrant does not.
  • Your team is allergic to Docker. Qdrant runs as a separate service — there’s no in-process embedded mode like Chroma.

Don’t use Weaviate when:

  • You’re prototyping or teaching RAG concepts. The Docker Compose configuration and module decisions are more than you need for a notebook. Start with Chroma.
  • RAM is severely constrained. Weaviate’s baseline image plus its module system consumes more memory than Qdrant for equivalent collection sizes. Qdrant with quantization wins for low-RAM setups.
  • You want the lowest possible query latency. Qdrant’s Rust implementation consistently edges out Weaviate’s Go implementation in raw throughput benchmarks.
  • You’re a solo developer building something small. Weaviate’s feature set is powerful, but it’s built for team-operated infrastructure.

The migration path most teams follow

Most teams that end up on Qdrant or Weaviate start on Chroma. Abstract your vector database interface early — wrap it in a VectorStore class or use LangChain/LlamaIndex adapters — and migration is a data re-ingest, not a code rewrite.

The typical trajectory:

  1. Prototype on Chroma embedded — no infrastructure overhead, fast iteration, works in a notebook
  2. Stage on Qdrant Docker — run real workloads, validate recall and filter accuracy, confirm hardware requirements
  3. Scale to Qdrant (most teams) or Weaviate (teams requiring hybrid search or multi-modal)

If “hybrid search” appears anywhere in your requirements from day one, skip Chroma and start on Weaviate. The initial Docker Compose setup takes 20 minutes. The migration you avoid later is worth more.


How these fit into the local AI stack

All three databases integrate with local RAG platforms like AnythingLLM, which supports both Chroma and Weaviate as vector store backends natively. For custom RAG pipelines backed by vLLM or Ollama, all three expose Python clients that plug directly into LangChain and LlamaIndex without adapter changes.

For high-throughput embedding ingestion — indexing large document collections or running GPU-accelerated embedding models — RunPod offers on-demand GPU instances that handle the batch embedding job while your vector database stays local. The database itself is cheap to run: a self-hosted Qdrant instance on a 4 vCPU / 8 GB RAM VPS handles millions of vectors at production query rates for around $30–50/month.

For an overview of how these databases slot into a broader local AI stack alongside inference servers and RAG front-ends, see The Open-Source AI Stack in 2026.


Frequently Asked Questions

Can Chroma handle production workloads? Yes, for the right scale. Chroma in server mode with the v1.5.x Rust backend handles production RAG pipelines with document collections under 1–2 million vectors. The Chroma team also runs Chroma Cloud as a managed service, which reflects confidence in its production readiness. For collections beyond 2M vectors with heavy filtering, Qdrant is the better fit.

Does Qdrant require a GPU? No. Qdrant runs on CPU using RAM-backed HNSW indexes. GPU acceleration for index building was added in v1.15+ to speed up ingestion on large datasets, but it is optional. You do not need a GPU to run Qdrant at production query rates.

Is Weaviate really open-source? The core Weaviate database is BSD-3-Clause, a permissive open-source license. You can self-host it at no cost with the full feature set. Weaviate Cloud is a separate commercial managed service. Some enterprise add-ons have different terms, but the self-hosted Docker image is fully open-source.

Which vector database works best with LangChain or LlamaIndex? All three have official integrations in both frameworks. Qdrant and Weaviate have slightly more complete support for advanced filtering and hybrid search pass-through, but all three work for standard retrieval pipelines. The abstraction layer is thin enough that switching backends is a config change, not a code change.

How does Weaviate hybrid search actually work? Weaviate runs BM25 (keyword) and HNSW (vector) searches in parallel, then fuses the ranked result lists using Reciprocal Rank Fusion (RRF) or a weighted alpha parameter. alpha=0 is pure BM25; alpha=1 is pure vector; alpha=0.5 weights both equally. Queries containing exact product codes, proper nouns, or technical identifiers get boosted by BM25 recall that pure vector search misses.


1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Sources

Was this article helpful?