Open WebUI Pipelines Guide 2026: Web Search, Rate Limiting, and Custom Logic for Your Local LLM
TL;DR: Open WebUI Pipelines is a Python middleware server that runs between Open WebUI and your LLM, adding custom logic without touching Open WebUI’s source code. It deploys in one docker run command. Most users running Open WebUI never touch it — which means most users are missing the feature that turns a personal chat UI into a multi-user platform with web access and programmable behavior.
After this guide you’ll have:
- A Pipelines server connected to Open WebUI v0.9.5, running alongside Ollama
- A working web search pipeline that pulls live results before your LLM responds
- A rate limit filter that caps requests per user for shared team deployments
What Pipelines Actually Is
Pipelines is not a plugin that loads inside Open WebUI. It’s a separate server — a transparent OpenAI API proxy running on port 9099. Open WebUI talks to it exactly like it talks to Ollama or any OpenAI-compatible backend. Pipelines does whatever you’ve programmed it to do, then either returns a result directly or forwards the (possibly modified) request to your actual LLM.
That architecture matters because it means:
- Pipelines logic runs server-side, not in the browser
- Users can’t bypass it by switching clients
- You can maintain state across requests (request counters, caches, memory)
There are two types of pipelines:
| Type | How it works | Use for |
|---|---|---|
| Filter | Wraps the request: inlet() → LLM → outlet() | Rate limiting, logging, system prompt injection, content filtering |
| Pipe | Replaces the LLM entirely; appears as a “model” in Open WebUI | Web search, custom RAG, wrapping non-OpenAI APIs |
A filter adds behavior around your existing model. A pipe is the model.
Prerequisites
- Open WebUI v0.9.5 running (see the Ollama + Open WebUI Linux setup guide)
- Docker installed on the same host
- Ollama running on port 11434 (default)
- Python 3.11 only if you want to develop pipelines locally — Docker handles it otherwise
Step 1: Deploy the Pipelines Server
docker run -d \
-p 9099:9099 \
--add-host=host.docker.internal:host-gateway \
-v pipelines:/app/pipelines \
--name pipelines \
--restart always \
ghcr.io/open-webui/pipelines:main
Flag breakdown:
-p 9099:9099— exposes the Pipelines API on your host--add-host=host.docker.internal:host-gateway— lets the container reach services on your host machine (Ollama, local APIs, SearXNG)-v pipelines:/app/pipelines— persists your pipeline Python files to a named Docker volume; they survive container restarts and updates
Confirm it’s alive:
curl http://localhost:9099/
# Expected: {"detail":"Not Found"} ← means the server is responding
The default API key is 0p3n-w3bu!. It’s public knowledge — fine for localhost, not fine for anything network-accessible. Override it by adding -e WEBUI_SECRET_KEY=your-actual-secret to the docker run command.
Step 2: Connect Pipelines to Open WebUI
- Open WebUI → Admin Panel → Settings → Connections
- Click + to add a new OpenAI-compatible connection
- API URL:
http://localhost:9099- If Open WebUI itself runs in Docker, use
http://host.docker.internal:9099instead
- If Open WebUI itself runs in Docker, use
- API key:
0p3n-w3bu!(or whatever you set) - Save, refresh the page
Pipe-type pipelines now appear in Open WebUI’s model picker. Filter-type pipelines appear under Admin Panel → Pipelines where you assign them to specific models or all models.
Pipeline Example 1: Live Web Search
This pipe pipeline fetches search results and injects them as context before forwarding the query to your local Ollama model. The result: your LLM can answer questions about current events without any fine-tuning.
A word on DuckDuckGo: DDG’s unofficial scraping API is the obvious free choice, but in 2026 it rate-limits hard — you hit 202 Ratelimit errors within a few queries from the same IP. It works for light personal use with delays between requests, but it’s unreliable for a pipeline that runs on every message. The two practical alternatives are:
- Brave Search API — free tier, 2,000 queries/month, real JSON API
- SearXNG (self-hosted, zero cost, zero rate limits) — swap the API call and you’re done
Create the file where Docker maps your pipelines volume. On a default install, find the path with:
docker inspect pipelines | grep -A5 Mounts
# Look for the "Source" path, typically /var/lib/docker/volumes/pipelines/_data
Save this as web_search_pipeline.py in that directory:
from typing import List, Optional
import requests
from pydantic import BaseModel
class Pipeline:
class Valves(BaseModel):
pipelines: List[str] = ["*"]
search_api_key: str = "" # Brave API key
searxng_url: str = "" # e.g. http://host.docker.internal:8080
num_results: int = 5
ollama_model: str = "llama3.2:3b"
def __init__(self):
self.name = "Web Search"
self.valves = self.Valves()
def _search_brave(self, query: str) -> str:
headers = {
"Accept": "application/json",
"X-Subscription-Token": self.valves.search_api_key,
}
r = requests.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": self.valves.num_results},
headers=headers,
timeout=10,
)
results = r.json().get("web", {}).get("results", [])
return "\n\n".join(
f"**{res['title']}**\n{res['description']}\n{res['url']}"
for res in results
)
def _search_searxng(self, query: str) -> str:
r = requests.get(
f"{self.valves.searxng_url}/search",
params={"q": query, "format": "json", "results": self.valves.num_results},
timeout=10,
)
results = r.json().get("results", [])
return "\n\n".join(
f"**{res.get('title','')}**\n{res.get('content','')}\n{res.get('url','')}"
for res in results
)
async def pipe(
self,
user_message: str,
model_id: str,
messages: List[dict],
body: dict,
) -> str:
if self.valves.searxng_url:
context = self._search_searxng(user_message)
elif self.valves.search_api_key:
context = self._search_brave(user_message)
else:
return "Configure either searxng_url or search_api_key in Valves."
import openai
client = openai.OpenAI(
base_url="http://host.docker.internal:11434/v1",
api_key="ollama",
)
response = client.chat.completions.create(
model=self.valves.ollama_model,
messages=[
{
"role": "system",
"content": f"Answer using these current search results:\n\n{context}"
},
*messages,
],
)
return response.choices[0].message.content
After saving, go to Admin Panel → Pipelines and click Refresh. “Web Search” appears. Configure the Valves (Brave key or SearXNG URL) through the UI — changes apply immediately without restarting anything.
Pipeline Example 2: Per-User Rate Limiting
If more than one person uses your Open WebUI instance, you need rate limits. Without them, one heavy user can queue up requests that lock everyone else out. This filter tracks requests per user ID with a sliding window:
Save as rate_limit_filter.py:
from typing import List, Optional
from datetime import datetime, timedelta
from pydantic import BaseModel
class Pipeline:
class Valves(BaseModel):
pipelines: List[str] = ["*"]
priority: int = 0
requests_per_minute: Optional[int] = 10
requests_per_hour: Optional[int] = 100
def __init__(self):
self.name = "Rate Limit Filter"
self.type = "filter"
self.valves = self.Valves()
self.user_requests: dict = {}
def _prune(self, user_id: str):
now = datetime.now()
self.user_requests[user_id] = [
t for t in self.user_requests.get(user_id, [])
if now - t < timedelta(hours=1)
]
def _is_limited(self, user_id: str) -> bool:
self._prune(user_id)
requests = self.user_requests.get(user_id, [])
now = datetime.now()
if self.valves.requests_per_minute:
recent = [t for t in requests if now - t < timedelta(minutes=1)]
if len(recent) >= self.valves.requests_per_minute:
return True
if self.valves.requests_per_hour:
if len(requests) >= self.valves.requests_per_hour:
return True
return False
async def inlet(self, body: dict, user: Optional[dict] = None) -> dict:
if user and user.get("role") == "user": # admins bypass
user_id = user.get("id", "unknown")
if self._is_limited(user_id):
raise Exception(
f"Rate limit exceeded: {self.valves.requests_per_minute}/min, "
f"{self.valves.requests_per_hour}/hr"
)
self.user_requests.setdefault(user_id, []).append(datetime.now())
return body
Key behaviors:
- Admin-role users in Open WebUI bypass the filter entirely
- Limits persist in memory (reset on container restart) — sufficient for most shared setups
- The Valves UI lets you adjust limits without touching the file
Assign this filter to all models or specific ones via Admin Panel → Pipelines → Filters.
Pipeline Example 3: Per-User System Prompt Injection
Different users want different behavior from the same model. An engineer wants terse code-focused answers; someone using the instance for writing wants a different tone. This filter injects the right system prompt based on the authenticated user’s email, overriding whatever the UI sends:
Save as user_system_prompt_filter.py:
from typing import List, Optional
from pydantic import BaseModel
USER_PROMPTS = {
"dev@example.com": (
"You are a senior software engineer. Be direct and concise. "
"Default to showing code over explanation."
),
"writer@example.com": (
"You are a creative writing assistant. Match the user's register and tone. "
"Prioritize style over correctness."
),
}
DEFAULT_PROMPT = "You are a helpful, knowledgeable assistant."
class Pipeline:
class Valves(BaseModel):
pipelines: List[str] = ["*"]
priority: int = 1 # run after rate limit filter (priority 0)
def __init__(self):
self.name = "Per-User System Prompt"
self.type = "filter"
self.valves = self.Valves()
async def inlet(self, body: dict, user: Optional[dict] = None) -> dict:
email = (user or {}).get("email", "")
prompt = USER_PROMPTS.get(email, DEFAULT_PROMPT)
messages = body.get("messages", [])
# Strip any existing system message before injecting ours
messages = [m for m in messages if m.get("role") != "system"]
messages.insert(0, {"role": "system", "content": prompt})
body["messages"] = messages
return body
The priority field controls filter execution order when multiple filters apply to the same model. Lower priority numbers run first. Set rate limiting at priority 0 so it rejects over-limit requests before the system prompt injection even runs.
For larger teams, replace the hardcoded USER_PROMPTS dict with a database query or read from an environment variable at startup. Anything you can do in Python, you can do here.
When NOT to Use Pipelines
Pipelines is a separate service with real overhead. Skip it when a simpler built-in covers the case:
- Web search only: Open WebUI v0.9.5 has built-in web search under Settings → Web Search, supporting SearXNG, Brave, Bing, and 15 other providers. No Pipelines needed.
- Model routing: Open WebUI’s native model selector and model groups handle routing without custom code.
- Solo user, no shared instance: Rate limiting and per-user prompts don’t matter if only you’re using it.
- Simple one-off logic: Open WebUI’s built-in Functions system is lighter — no separate server, installed from the Admin Panel in seconds.
Pipelines earns its complexity when you need persistent server-side state that survives across requests (request counters, caches), behavior that users cannot toggle off from their client, or external API integrations that run on every message.
Troubleshooting Common Issues
Pipeline doesn’t appear after saving the file
Restart the container (docker restart pipelines) and check logs for syntax errors:
docker logs pipelines --tail 30
A Python syntax error in any file in the pipelines directory prevents all pipelines from loading.
“Connection refused” when Open WebUI can’t reach Pipelines
This almost always means Open WebUI is running in Docker and you used localhost:9099 instead of host.docker.internal:9099. Containers don’t share the host’s loopback interface.
Pipe pipeline returns empty responses
The ollama_model in your Valves must exactly match a model you’ve pulled in Ollama. Verify with ollama list.
Rate limit filter not triggering
Filter pipelines must be explicitly assigned to models in the Admin Panel. A filter sitting in the Pipelines list does nothing until you attach it.
The Broader Ecosystem
The community publishes ready-to-use pipelines at openwebui.com/functions. Notable ones beyond these three examples:
- Langfuse filter — logs every request for debugging and token usage monitoring
- LibreTranslate filter — real-time translation via your self-hosted LibreTranslate instance
- Mem0 memory filter — gives your LLM persistent memory across sessions (see the AnythingLLM RAG setup guide for a comparison to RAG-based memory approaches)
Install any of them by dropping the Python file into the pipelines volume. No package manager, no config files beyond the Valves UI.
For heavier GPU workloads — training, batch inference, anything that saturates local hardware — RunPod is worth comparing against your local setup. A dedicated GPU rental often makes sense for jobs you’d otherwise block your machine on for hours.
FAQ
Does Pipelines require Open WebUI specifically, or can any OpenAI-compatible client use it?
Any client that speaks the OpenAI API spec can point at port 9099. Filter pipelines require the client to pass user context headers that Open WebUI adds automatically — so filters work reliably with Open WebUI but may not trigger with raw API clients.
Can I run Pipelines without Docker?
Yes: git clone https://github.com/open-webui/pipelines && cd pipelines && pip install -r requirements.txt && uvicorn main:app --host 0.0.0.0 --port 9099. Python 3.11 specifically — 3.12 has reported dependency conflicts with some example pipelines as of mid-2026.
Is the default API key 0p3n-w3bu! a security risk?
Only if Pipelines is reachable from outside your machine. On a standard single-host setup with no public port exposure, it’s fine. If you’re behind a router with no port forwarding, you’re safe. If you expose port 9099 externally, change it immediately: -e WEBUI_SECRET_KEY=your-random-string.
How do I pass API keys to pipelines securely?
Use Valves. Each pipeline’s Valves class exposes fields in the Admin Panel UI, which stores values in Open WebUI’s database — not in plain-text config files. Avoid hardcoding secrets directly in pipeline Python files, especially if you share them.
Can a filter pipeline modify the LLM’s response, not just the input?
Yes — implement an outlet(self, body: dict, user: Optional[dict] = None) -> dict method alongside inlet(). The outlet runs after the LLM responds, letting you clean up output, append disclaimers, log responses, or translate the reply.
Sources
- Open WebUI Pipelines GitHub repository
- Open WebUI Pipelines documentation
- Rate limit filter pipeline example — open-webui/pipelines
- Open WebUI v0.9.5 release notes
- DuckDuckGo rate limit issues — open-webui/open-webui Discussion #13292
- Brave Search API documentation
- SearXNG self-hosted search engine
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →