May 31, 2026

Khoj Review 2026: Your Self-Hosted AI Second Brain

By AIFoss · 12 min read

TL;DR: Khoj is the only serious open-source AI assistant that indexes your personal files — Obsidian vaults, org-mode, PDFs — and runs on your own hardware. Setup takes about 30 minutes via Docker Compose. Answer quality depends on which LLM you configure, but even a mid-range local model beats having your notes on someone else’s server.

	Khoj (self-hosted)	Notion AI	Mem.ai
Best for	Obsidian/org-mode power users	Teams already in Notion	Solo creators wanting auto-organization
Privacy	Full — data stays on your machine	Notion’s servers	Mem.ai’s servers
Personal knowledge integration	Obsidian, org-mode, PDF, GitHub, Notion API	Notion pages only	Mem.ai notes only
Cost/month	~$0–$5 (electricity only)	$10+/user	$14.99/user
The catch	Docker setup required, AGPL-3.0 license	Can’t index local files	No self-host option

Honest take: If you have an Obsidian vault with hundreds of notes and want to query it with AI that stays on your machine, Khoj is the only FOSS option worth running. Notion AI and Mem.ai send your knowledge base to their clouds and don’t touch local filesystems.

What Khoj actually is

Khoj is an open-source personal AI assistant that does two things most alternatives skip: it indexes your own files, and it lets you bring your own LLM. You run it as a server — on a local machine, a home lab box, or a cheap VPS — point it at your documents, configure an LLM backend (local or cloud), and get a chat interface that answers from your actual knowledge base.

The GitHub repository (khoj-ai/khoj) has been actively maintained since 2022. The stable release on PyPI is v1.42.10, published July 2025. A v2.0 series has been in beta since early 2025 — beta.28 was pushed in late May 2026 — but the stable PyPI channel stays on the 1.42.x series for now. If you want the latest features, pull the beta image; if you want something predictable, stick with the stable pip package.

License is AGPL-3.0. For personal or internal-team use, this is a non-issue. If you’re thinking about building a customer-facing product on top of Khoj, AGPL means you must open-source your modifications when you expose the service over a network. Talk to your legal team before going that route.

What it indexes

Khoj stores vector embeddings in PostgreSQL with the pgvector extension. The supported source types as of v1.42.10:

Markdown files — your entire Obsidian vault, any folder of .md files
Org-mode files — one of the few tools with native org support, a real differentiator for Emacs users
PDFs — text extraction via poppler
Word documents — .docx parsing
Plain text — .txt files
Notion pages — via Notion API integration (needs an integration token configured in your workspace)
GitHub repositories — can index Markdown and code files from a repo
Web pages — add URLs manually, or let the SearxNG-backed web search pull them at query time

Indexing is incremental. The first sync embeds everything; subsequent syncs only re-embed files that changed. For a 500-note Obsidian vault on a modern machine, the initial index runs in 3–8 minutes with a cloud embedding model.

Setup: Docker Compose in about 30 minutes

Self-hosting Khoj means running five containers: the main server, a PostgreSQL + pgvector database, a Terrarium Python sandbox (for code execution), a SearxNG instance (for web search), and an optional computer-control service. The docker-compose.yml in the repo orchestrates all of it.

# Download the compose file
curl -o docker-compose.yml \
  https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml

# Set required secrets
export KHOJ_ADMIN_PASSWORD=changeme
export KHOJ_DJANGO_SECRET_KEY=$(openssl rand -hex 32)

# Optional: add a cloud LLM API key if you don't want to run local models
export OPENAI_API_KEY=sk-...
# or
export ANTHROPIC_API_KEY=sk-ant-...

# Start everything
docker compose up -d

The Khoj server listens on port 42110. Open http://localhost:42110 and log in with the admin credentials you set. The PostgreSQL service uses the pgvector/pgvector:pg15 image with a health check so Khoj won’t start until the database is fully ready — this prevents the connection errors that tripped up earlier versions.

Hardware requirements, per project documentation: minimum 4 GB RAM if you’re using cloud LLM APIs. For local model inference via Ollama, plan for 8–16 GB RAM plus a GPU. For comfortable local inference with 30B-class models, 16 GB GPU VRAM is the recommended floor. An RTX 4090 handles Qwen2.5-72B quantized without breaking a sweat; an RTX 3090 covers the 30B range well. If you’d rather not buy GPU hardware, RunPod lets you spin up a GPU instance to run your Khoj server for a fraction of the purchase cost.

Connecting a local LLM via Ollama

If you’d rather keep your notes entirely off cloud APIs, Khoj integrates with Ollama cleanly. Once Ollama is running on the same host (or reachable over the network), configure it in the Khoj admin panel at http://localhost:42110/server/admin/:

Go to AI Models → LLM Model Config
Set the API base to your Ollama instance: http://host.docker.internal:11434
Set the model name to match what you’ve pulled in Ollama — e.g. qwen2.5:32b, llama3.1:8b, mistral:7b
Save and set it as the default chat model

The embedding model is a separate setting. Khoj defaults to text-embedding-3-small from OpenAI. To go fully local, swap it for nomic-embed-text via Ollama — configure it the same way in AI Models → Embedding Model Config, then trigger a full reindex. You only need to do this once.

Practical note: with an 8B local model, Khoj answers simple note retrieval questions well. Multi-hop reasoning (“what’s the connection between the ideas in my Kubernetes notes and my database optimization notes?”) is where smaller models start to struggle. A 32B-class model is where you stop noticing the gap compared to GPT-4o for most knowledge work.

The Obsidian workflow

The Khoj Obsidian plugin is in the community plugins marketplace. Install it, enable it, and open its settings to point it at http://localhost:42110. The plugin syncs your vault automatically on a periodic schedule; hit Force Sync in plugin settings to kick off an immediate reindex.

After sync, you get a chat panel inside Obsidian. Ask natural language questions — “What did I write about TCP congestion control?”, “Summarize my notes on the Q4 project” — and Khoj retrieves the semantically relevant notes, passes them as context to your LLM, and returns an answer with citations linking back to the source notes.

The beta.25 release added progress tracking during batch sync, which matters once your vault grows past a few hundred notes. Large vaults (1,000+ notes) take 5–15 minutes on first index, shorter on subsequent syncs.

One thing that surprises people: the chat context persists through a session. You can ask a follow-up question and Khoj maintains the thread. This is handled by the server, not the plugin, so it works the same way in the browser interface.

Emacs, mobile, and other clients

For org-mode users, Khoj ships an Emacs package. Authenticate it to your self-hosted server, and you can call M-x khoj-chat or M-x khoj-search without leaving your editor. For anyone who lives in org-roam or uses org files as their primary knowledge store, this is the direct path — no browser required.

Other clients that work with your self-hosted Khoj:

Browser — the web interface at localhost:42110 has the full feature set: chat, search, agents, automations panel, and research mode
Desktop app — Electron wrapper available for macOS, Windows, and Linux; connects to a configured server
Mobile — iOS and Android apps that authenticate against your self-hosted instance
WhatsApp — this one only works with app.khoj.dev; connecting WhatsApp to a self-hosted instance requires additional setup that isn’t officially supported

Custom agents and automations

Khoj agents are LLM personas with a specific system prompt, a scoped knowledge base (a subset of your indexed files), and tool access (web search, code execution, both). The web interface gives you a simple form to create them. This is the same mental model as OpenAI’s GPTs but running on your hardware.

Automations let you schedule recurring tasks against any agent. Useful patterns: daily news summary for a topic you’re researching, weekly digest of notes tagged with a particular project, or a morning briefing that pulls the latest on a few keywords from SearxNG. The scheduling is cron-style, configured in the automations panel.

The code execution sandbox (Terrarium) is a locked-down Python runtime. Agents can write and run Python code to process data, do calculations, or generate charts. It’s isolated from your host filesystem — the sandbox can’t reach files outside the Khoj data volume.

Deep research mode

Research mode (added in v1.38+) does multi-hop web searches: it plans a research strategy, fires multiple SearxNG queries, reads the top results, synthesizes them, and returns a response with inline citations. Output quality is meaningfully better than a single-shot web query. The tradeoff is speed — expect 30–60 seconds for a thorough research task.

This requires the SearxNG container from the Docker Compose setup. If you don’t need web search and want a leaner setup, you can disable the SearxNG service and use Khoj solely as a document Q&A tool.

When NOT to use Khoj

Your notes live in Notion. If Notion is your primary workspace, Notion AI is more seamless. Khoj can index Notion pages via the API, but it doesn’t have real-time sync — changes take a sync cycle to propagate. For a mixed workflow (some notes local, some in Notion) it’s workable; for all-Notion users, the native AI is less friction.

You need team access. Self-hosted Khoj is effectively single-user. The cloud version (app.khoj.dev) has multi-user support, but running your team’s notes through someone else’s cloud is the opposite of why you’d self-host. If you need shared knowledge bases with proper access control, AnythingLLM has multi-user workspaces designed for teams.

The AGPL-3.0 license is a problem. Building a commercial product that exposes Khoj over a network means you must open-source your modifications. If that’s a problem for your use case, check alternatives with permissive licenses.

You want a no-maintenance setup. Khoj moves fast. A three-month-old install will miss meaningful features and, occasionally, security fixes. Budget 1–2 hours per month for updates. If that’s too much overhead, app.khoj.dev handles it for you — at the cost of data sovereignty.

The cloud gap is real. The hosted version at app.khoj.dev consistently gets features before the self-hosted Docker release. WhatsApp integration, certain automation triggers, and the computer-control agent are more polished on the cloud. The gap shrinks as beta features graduate to stable, but if you want the absolute latest capabilities, you’re choosing between the self-hosted privacy benefit and cloud-hosted convenience. There’s no version that gives you both fully.

How it compares to other local document tools

For pure document RAG, AnythingLLM offers a more polished interface with multi-user workspace isolation and a better onboarding experience for non-developers. Khoj’s advantages are the native Obsidian and Emacs integrations, a wider range of source types, and a more developed agent and automation system.

LocalGPT is simpler — single-command setup, one folder of documents, no accounts — but it has no persistent memory between sessions, no agents, and no cross-platform clients. For a quick one-off document Q&A, LocalGPT gets you there faster. For an ongoing personal knowledge assistant, Khoj is the more capable tool.

For the vector database mechanics underlying Khoj’s search, the RAG Architecture Deep Dive covers chunking strategies, embedding models, and retrieval patterns in detail — Khoj’s pgvector-based approach maps directly to the concepts there.

Frequently Asked Questions

Is Khoj free to self-host indefinitely? Yes. The self-hosted version has no usage limits, no paywalled features, and no telemetry you can’t disable. You pay for your server hardware/electricity and, optionally, for cloud LLM API calls. With a local Ollama backend, the running cost is zero beyond the hardware.

Can Khoj run completely offline? Almost entirely. With Ollama as the LLM and a local embedding model (e.g., nomic-embed-text), document search and chat work offline. Web research requires SearxNG, which itself needs internet to query search engines. Disable the SearxNG service in the Docker Compose file for a fully air-gapped setup.

How does the Obsidian plugin handle note privacy compared to other AI Obsidian plugins? Most AI-powered Obsidian plugins send your note content to a cloud API at query time. Khoj’s plugin routes everything through your self-hosted server — the note content never leaves your machine or local network. Khoj also indexes your entire vault (with pgvector embeddings), enabling semantic search across all notes, not just the currently open file.

What’s the difference between self-hosted Khoj and app.khoj.dev? Both use the same codebase. The cloud version runs updated images faster, has better WhatsApp integration, and offers multi-user support. The self-hosted version keeps your data local. You can export your chat history and documents from the cloud version and import them to a self-hosted instance — no lock-in.

Does Khoj support Apple Silicon for fully local inference? Yes. The Docker images support ARM64 and the Ollama integration works natively on Apple Silicon. A Mac with 32 GB unified memory (Mac Mini M4 Pro or Mac Studio M4 Max) runs 30B models comfortably and handles Khoj’s pgvector indexing without issue. It’s one of the better self-hosted setups if you want quiet, low-power operation.

Sources

Recommended Gear

RTX 4090 — runs 70B quantized models locally; pairs well with a full Khoj stack
RTX 3090 — handles 30B-class models; solid choice for a dedicated Khoj home server

Was this article helpful?