May 21, 2026

Ollama + Open WebUI on Linux: 15-Minute Setup Guide

By AIFoss · 8 min read

ollamaaiselfhostedllmopensource

By the end of this guide you’ll have a local LLM running under a full browser-based chat interface — no internet required after setup, no API keys, nothing leaving your machine.

The stack: Ollama handles model downloading, management, and inference via a local REST API. Open WebUI sits on top as the chat frontend. Together they replicate the core ChatGPT experience on your own hardware.

Versions used: Ollama v0.24.0, Open WebUI v0.9.5 (May 2026 — check their GitHub pages for newer releases before starting).

What you need

Hardware:

Setup	Minimum RAM	GPU	What you can run
CPU-only	8 GB	Not required	3B models at 3–6 tok/s
CPU-only	16 GB	Not required	7B models at 3–5 tok/s
GPU (8 GB VRAM)	16 GB	CUDA / ROCm / Vulkan	7B–13B at 20–40 tok/s
GPU (16 GB VRAM)	32 GB	CUDA / ROCm	30B–34B at 15–25 tok/s

CPU inference is functional for experiments. For daily interactive use, a GPU makes the difference.

Software prerequisites:

64-bit Linux (Ubuntu 22.04+ or Debian 12 recommended)
curl for the Ollama installer
Docker (preferred) or Python 3.11+ for Open WebUI

Step 1: Install Ollama

The official installer handles the binary, the ollama system user, and a systemd service:

curl -fsSL https://ollama.com/install.sh | sh

After it completes, Ollama starts automatically and listens on http://localhost:11434.

Verify the install:

ollama --version
# ollama version is 0.24.0

curl http://localhost:11434
# Ollama is running

NVIDIA GPU: The installer auto-detects CUDA. Nothing extra required.

AMD GPU: The installer attempts ROCm. If GPU layers don’t engage, set the override for your GPU architecture:

export HSA_OVERRIDE_GFX_VERSION=10.3.0

Add it to the Ollama service environment if you want it permanent (see model storage section below for how to edit service env).

CPU-only: Nothing extra — Ollama falls back to CPU automatically.

Change the model storage location

By default models go in ~/.ollama/models. For a different drive:

sudo systemctl edit ollama

In the override file that opens:

[Service]
Environment="OLLAMA_MODELS=/mnt/storage/ollama"

Save, then reload:

sudo systemctl daemon-reload && sudo systemctl restart ollama

Service management

sudo systemctl status ollama       # check it's running
sudo systemctl restart ollama      # restart
sudo journalctl -u ollama -f       # live logs

Step 2: Pull a model and verify inference

Before touching Open WebUI, confirm Ollama itself works end-to-end.

ollama pull llama3.2:3b    # 2.0 GB — fast to download for testing
ollama run llama3.2:3b

Type a prompt and press Enter. If you get a coherent response, the inference stack is good. Exit with /bye.

For 7B models with 16 GB RAM or a GPU:

ollama pull gemma3:7b       # 5.0 GB
ollama pull mistral:7b      # 4.1 GB

Check what’s downloaded:

ollama list

Step 3: Install Open WebUI

Two paths — Docker is simpler to maintain; pip avoids the Docker dependency.

Option A: Docker (recommended)

Install Docker if you don’t have it:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in for the group change

Run Open WebUI with Ollama as the backend. On Linux, host.docker.internal doesn’t always resolve — use your LAN IP:

# Find your LAN IP
ip addr show | grep 'inet ' | grep -v 127.0.0.1

Then start the container:

docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://192.168.1.X:11434 \
  ghcr.io/open-webui/open-webui:main

Replace 192.168.1.X with your actual IP.

Simpler alternative — if Ollama and Open WebUI are on the same host, use host networking and skip the IP lookup:

docker run -d \
  --name open-webui \
  --restart always \
  --network host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  ghcr.io/open-webui/open-webui:main

With --network host, Open WebUI runs on port 8080, so open http://localhost:8080.

NVIDIA GPU passthrough:

docker run -d \
  --name open-webui \
  --gpus all \
  --network host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  ghcr.io/open-webui/open-webui:cuda

Option B: pip

pip install open-webui
open-webui serve

First startup takes a minute — it installs frontend dependencies on initial run. Open WebUI starts on port 8080.

Navigate to http://localhost:3000 (Docker with -p 3000:8080) or http://localhost:8080 (pip or --network host).

The signup screen appears — create an admin account. This is local-only, no email verification.
Open WebUI auto-detects your Ollama models. Click the model selector at the top of the chat interface.
Select a model and start chatting.

If no models appear, go to Settings → Connections and confirm the Ollama URL shows http://localhost:11434. You can also pull models directly through the UI at Settings → Models — type a model name and click Pull.

Step 5: Configuration worth knowing

System prompts per model:
Settings → Models → select model → System Prompt. Set a persistent persona, coding language preference, or output format rule that applies to every conversation with that model.

Multi-user access:
The first account becomes admin. Subsequent signups are standard users. Open WebUI has full user management — conversations are scoped per user, admin can see usage stats. Good for a shared home server.

Built-in RAG:
Open WebUI includes document upload with RAG. For simple use — upload a PDF, ask questions about it — it works. For a large corpus or persistent document collections, AnythingLLM handles that workflow better.

LAN access from other devices:
Bind to 0.0.0.0 is the default when using --network host. Open your firewall port:

sudo ufw allow 3000/tcp    # or 8080 depending on your setup

Then access from any device on the same network at http://192.168.1.X:3000.

A note on Open WebUI’s license

Ollama is MIT-licensed — no restrictions.

Open WebUI has changed licenses several times. As of mid-2026, it uses a custom BSD-3-clause variant with a CLA requirement. For personal and small-team use (under 50 users) it’s free to self-host and rebrand. Larger or commercial deployments should read the official license page before deploying. It’s not OSI-certified open source, which matters if that distinction is important to your context.

When NOT to use this setup

You need concurrent users or high throughput. Ollama processes one request at a time per model instance. That’s fine for a household or a few developers sharing a machine, but falls apart under real load. For multi-user production serving, vLLM is the right tool. If you don’t want to manage inference hardware at all, cloud GPU on RunPod runs an A40 for around $0.20/hr — cheaper than buying a GPU if inference is occasional.

You’re on Windows or macOS and want a desktop app. LM Studio installs in two clicks, has a native UI, and doesn’t require Docker or a browser tab. Open WebUI’s browser-based approach is a feature on a Linux server; on a personal laptop it’s an extra step.

You primarily need a coding assistant. Open WebUI is a general-purpose chat interface. If you’re routing LLM calls through your editor, Continue.dev or Aider integrates directly into your workflow rather than making you switch to a browser tab.

Your GPU has less than 6 GB VRAM. You can run CPU-only, but if you expected GPU acceleration and it’s not engaging for the model size you want, the experience will disappoint. Run ollama ps during inference — it shows which layers land on GPU vs. RAM. Offloading too many layers to RAM tanks throughput.

Troubleshooting

“Unable to connect to Ollama” in Open WebUI:
Run curl http://localhost:11434 on the host. If that fails, Ollama isn’t running — check sudo systemctl status ollama. If Ollama responds but Open WebUI can’t reach it, the Docker container is using the wrong IP — switch to the LAN IP approach or --network host.

Models are slow despite having a GPU:
Run ollama ps while a prompt is generating. It shows how many layers are on the GPU vs. CPU. If most layers are on CPU, your VRAM is too small for the loaded model — try a smaller model or a lower quantization.

Docker container exits immediately:
Check the logs: docker logs open-webui. The usual causes are a port conflict (something else on 3000 or 8080) or a volume permission issue.

Can’t pull models:
Ollama pulls from ollama.com. Behind a corporate proxy, set HTTPS_PROXY in the Ollama service environment the same way you set OLLAMA_MODELS above.

The result

From a blank Linux box to a private, browser-based ChatGPT equivalent in under 20 minutes. Ollama manages the models; Open WebUI handles the interface. No data leaves your machine.

This setup handles the full everyday local-AI workflow for personal and small-team use. For deeper coverage of what Ollama can do beyond the basics — model quantization, custom modelfiles, the API — the Ollama review goes further. For a comparison of all local LLM runner options, see the Ollama vs LM Studio vs llama.cpp breakdown.

1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Was this article helpful?