Open WebUI Pipelines Guide 2026: Web Search, Rate Limiting, and Custom Logic for Your Local LLM

open-webuiollamaselfhostedaipipelinestutorial

TL;DR: Open WebUI Pipelines is a Python middleware server that runs between Open WebUI and your LLM, adding custom logic without touching Open WebUI’s source code. It deploys in one docker run command. Most users running Open WebUI never touch it — which means most users are missing the feature that turns a personal chat UI into a multi-user platform with web access and programmable behavior.

After this guide you’ll have:

  • A Pipelines server connected to Open WebUI v0.9.5, running alongside Ollama
  • A working web search pipeline that pulls live results before your LLM responds
  • A rate limit filter that caps requests per user for shared team deployments

What Pipelines Actually Is

Pipelines is not a plugin that loads inside Open WebUI. It’s a separate server — a transparent OpenAI API proxy running on port 9099. Open WebUI talks to it exactly like it talks to Ollama or any OpenAI-compatible backend. Pipelines does whatever you’ve programmed it to do, then either returns a result directly or forwards the (possibly modified) request to your actual LLM.

That architecture matters because it means:

  • Pipelines logic runs server-side, not in the browser
  • Users can’t bypass it by switching clients
  • You can maintain state across requests (request counters, caches, memory)

There are two types of pipelines:

TypeHow it worksUse for
FilterWraps the request: inlet() → LLM → outlet()Rate limiting, logging, system prompt injection, content filtering
PipeReplaces the LLM entirely; appears as a “model” in Open WebUIWeb search, custom RAG, wrapping non-OpenAI APIs

A filter adds behavior around your existing model. A pipe is the model.

Prerequisites

  • Open WebUI v0.9.5 running (see the Ollama + Open WebUI Linux setup guide)
  • Docker installed on the same host
  • Ollama running on port 11434 (default)
  • Python 3.11 only if you want to develop pipelines locally — Docker handles it otherwise

Step 1: Deploy the Pipelines Server

docker run -d \
  -p 9099:9099 \
  --add-host=host.docker.internal:host-gateway \
  -v pipelines:/app/pipelines \
  --name pipelines \
  --restart always \
  ghcr.io/open-webui/pipelines:main

Flag breakdown:

  • -p 9099:9099 — exposes the Pipelines API on your host
  • --add-host=host.docker.internal:host-gateway — lets the container reach services on your host machine (Ollama, local APIs, SearXNG)
  • -v pipelines:/app/pipelines — persists your pipeline Python files to a named Docker volume; they survive container restarts and updates

Confirm it’s alive:

curl http://localhost:9099/
# Expected: {"detail":"Not Found"}  ← means the server is responding

The default API key is 0p3n-w3bu!. It’s public knowledge — fine for localhost, not fine for anything network-accessible. Override it by adding -e WEBUI_SECRET_KEY=your-actual-secret to the docker run command.

Step 2: Connect Pipelines to Open WebUI

  1. Open WebUI → Admin PanelSettingsConnections
  2. Click + to add a new OpenAI-compatible connection
  3. API URL: http://localhost:9099
    • If Open WebUI itself runs in Docker, use http://host.docker.internal:9099 instead
  4. API key: 0p3n-w3bu! (or whatever you set)
  5. Save, refresh the page

Pipe-type pipelines now appear in Open WebUI’s model picker. Filter-type pipelines appear under Admin Panel → Pipelines where you assign them to specific models or all models.

This pipe pipeline fetches search results and injects them as context before forwarding the query to your local Ollama model. The result: your LLM can answer questions about current events without any fine-tuning.

A word on DuckDuckGo: DDG’s unofficial scraping API is the obvious free choice, but in 2026 it rate-limits hard — you hit 202 Ratelimit errors within a few queries from the same IP. It works for light personal use with delays between requests, but it’s unreliable for a pipeline that runs on every message. The two practical alternatives are:

  • Brave Search API — free tier, 2,000 queries/month, real JSON API
  • SearXNG (self-hosted, zero cost, zero rate limits) — swap the API call and you’re done

Create the file where Docker maps your pipelines volume. On a default install, find the path with:

docker inspect pipelines | grep -A5 Mounts
# Look for the "Source" path, typically /var/lib/docker/volumes/pipelines/_data

Save this as web_search_pipeline.py in that directory:

from typing import List, Optional
import requests
from pydantic import BaseModel

class Pipeline:
    class Valves(BaseModel):
        pipelines: List[str] = ["*"]
        search_api_key: str = ""       # Brave API key
        searxng_url: str = ""          # e.g. http://host.docker.internal:8080
        num_results: int = 5
        ollama_model: str = "llama3.2:3b"

    def __init__(self):
        self.name = "Web Search"
        self.valves = self.Valves()

    def _search_brave(self, query: str) -> str:
        headers = {
            "Accept": "application/json",
            "X-Subscription-Token": self.valves.search_api_key,
        }
        r = requests.get(
            "https://api.search.brave.com/res/v1/web/search",
            params={"q": query, "count": self.valves.num_results},
            headers=headers,
            timeout=10,
        )
        results = r.json().get("web", {}).get("results", [])
        return "\n\n".join(
            f"**{res['title']}**\n{res['description']}\n{res['url']}"
            for res in results
        )

    def _search_searxng(self, query: str) -> str:
        r = requests.get(
            f"{self.valves.searxng_url}/search",
            params={"q": query, "format": "json", "results": self.valves.num_results},
            timeout=10,
        )
        results = r.json().get("results", [])
        return "\n\n".join(
            f"**{res.get('title','')}**\n{res.get('content','')}\n{res.get('url','')}"
            for res in results
        )

    async def pipe(
        self,
        user_message: str,
        model_id: str,
        messages: List[dict],
        body: dict,
    ) -> str:
        if self.valves.searxng_url:
            context = self._search_searxng(user_message)
        elif self.valves.search_api_key:
            context = self._search_brave(user_message)
        else:
            return "Configure either searxng_url or search_api_key in Valves."

        import openai
        client = openai.OpenAI(
            base_url="http://host.docker.internal:11434/v1",
            api_key="ollama",
        )
        response = client.chat.completions.create(
            model=self.valves.ollama_model,
            messages=[
                {
                    "role": "system",
                    "content": f"Answer using these current search results:\n\n{context}"
                },
                *messages,
            ],
        )
        return response.choices[0].message.content

After saving, go to Admin Panel → Pipelines and click Refresh. “Web Search” appears. Configure the Valves (Brave key or SearXNG URL) through the UI — changes apply immediately without restarting anything.

Pipeline Example 2: Per-User Rate Limiting

If more than one person uses your Open WebUI instance, you need rate limits. Without them, one heavy user can queue up requests that lock everyone else out. This filter tracks requests per user ID with a sliding window:

Save as rate_limit_filter.py:

from typing import List, Optional
from datetime import datetime, timedelta
from pydantic import BaseModel

class Pipeline:
    class Valves(BaseModel):
        pipelines: List[str] = ["*"]
        priority: int = 0
        requests_per_minute: Optional[int] = 10
        requests_per_hour: Optional[int] = 100

    def __init__(self):
        self.name = "Rate Limit Filter"
        self.type = "filter"
        self.valves = self.Valves()
        self.user_requests: dict = {}

    def _prune(self, user_id: str):
        now = datetime.now()
        self.user_requests[user_id] = [
            t for t in self.user_requests.get(user_id, [])
            if now - t < timedelta(hours=1)
        ]

    def _is_limited(self, user_id: str) -> bool:
        self._prune(user_id)
        requests = self.user_requests.get(user_id, [])
        now = datetime.now()
        if self.valves.requests_per_minute:
            recent = [t for t in requests if now - t < timedelta(minutes=1)]
            if len(recent) >= self.valves.requests_per_minute:
                return True
        if self.valves.requests_per_hour:
            if len(requests) >= self.valves.requests_per_hour:
                return True
        return False

    async def inlet(self, body: dict, user: Optional[dict] = None) -> dict:
        if user and user.get("role") == "user":  # admins bypass
            user_id = user.get("id", "unknown")
            if self._is_limited(user_id):
                raise Exception(
                    f"Rate limit exceeded: {self.valves.requests_per_minute}/min, "
                    f"{self.valves.requests_per_hour}/hr"
                )
            self.user_requests.setdefault(user_id, []).append(datetime.now())
        return body

Key behaviors:

  • Admin-role users in Open WebUI bypass the filter entirely
  • Limits persist in memory (reset on container restart) — sufficient for most shared setups
  • The Valves UI lets you adjust limits without touching the file

Assign this filter to all models or specific ones via Admin Panel → Pipelines → Filters.

Pipeline Example 3: Per-User System Prompt Injection

Different users want different behavior from the same model. An engineer wants terse code-focused answers; someone using the instance for writing wants a different tone. This filter injects the right system prompt based on the authenticated user’s email, overriding whatever the UI sends:

Save as user_system_prompt_filter.py:

from typing import List, Optional
from pydantic import BaseModel

USER_PROMPTS = {
    "dev@example.com": (
        "You are a senior software engineer. Be direct and concise. "
        "Default to showing code over explanation."
    ),
    "writer@example.com": (
        "You are a creative writing assistant. Match the user's register and tone. "
        "Prioritize style over correctness."
    ),
}
DEFAULT_PROMPT = "You are a helpful, knowledgeable assistant."

class Pipeline:
    class Valves(BaseModel):
        pipelines: List[str] = ["*"]
        priority: int = 1  # run after rate limit filter (priority 0)

    def __init__(self):
        self.name = "Per-User System Prompt"
        self.type = "filter"
        self.valves = self.Valves()

    async def inlet(self, body: dict, user: Optional[dict] = None) -> dict:
        email = (user or {}).get("email", "")
        prompt = USER_PROMPTS.get(email, DEFAULT_PROMPT)

        messages = body.get("messages", [])
        # Strip any existing system message before injecting ours
        messages = [m for m in messages if m.get("role") != "system"]
        messages.insert(0, {"role": "system", "content": prompt})
        body["messages"] = messages
        return body

The priority field controls filter execution order when multiple filters apply to the same model. Lower priority numbers run first. Set rate limiting at priority 0 so it rejects over-limit requests before the system prompt injection even runs.

For larger teams, replace the hardcoded USER_PROMPTS dict with a database query or read from an environment variable at startup. Anything you can do in Python, you can do here.

When NOT to Use Pipelines

Pipelines is a separate service with real overhead. Skip it when a simpler built-in covers the case:

  • Web search only: Open WebUI v0.9.5 has built-in web search under Settings → Web Search, supporting SearXNG, Brave, Bing, and 15 other providers. No Pipelines needed.
  • Model routing: Open WebUI’s native model selector and model groups handle routing without custom code.
  • Solo user, no shared instance: Rate limiting and per-user prompts don’t matter if only you’re using it.
  • Simple one-off logic: Open WebUI’s built-in Functions system is lighter — no separate server, installed from the Admin Panel in seconds.

Pipelines earns its complexity when you need persistent server-side state that survives across requests (request counters, caches), behavior that users cannot toggle off from their client, or external API integrations that run on every message.

Troubleshooting Common Issues

Pipeline doesn’t appear after saving the file

Restart the container (docker restart pipelines) and check logs for syntax errors:

docker logs pipelines --tail 30

A Python syntax error in any file in the pipelines directory prevents all pipelines from loading.

“Connection refused” when Open WebUI can’t reach Pipelines

This almost always means Open WebUI is running in Docker and you used localhost:9099 instead of host.docker.internal:9099. Containers don’t share the host’s loopback interface.

Pipe pipeline returns empty responses

The ollama_model in your Valves must exactly match a model you’ve pulled in Ollama. Verify with ollama list.

Rate limit filter not triggering

Filter pipelines must be explicitly assigned to models in the Admin Panel. A filter sitting in the Pipelines list does nothing until you attach it.

The Broader Ecosystem

The community publishes ready-to-use pipelines at openwebui.com/functions. Notable ones beyond these three examples:

  • Langfuse filter — logs every request for debugging and token usage monitoring
  • LibreTranslate filter — real-time translation via your self-hosted LibreTranslate instance
  • Mem0 memory filter — gives your LLM persistent memory across sessions (see the AnythingLLM RAG setup guide for a comparison to RAG-based memory approaches)

Install any of them by dropping the Python file into the pipelines volume. No package manager, no config files beyond the Valves UI.

For heavier GPU workloads — training, batch inference, anything that saturates local hardware — RunPod is worth comparing against your local setup. A dedicated GPU rental often makes sense for jobs you’d otherwise block your machine on for hours.

FAQ

Does Pipelines require Open WebUI specifically, or can any OpenAI-compatible client use it?

Any client that speaks the OpenAI API spec can point at port 9099. Filter pipelines require the client to pass user context headers that Open WebUI adds automatically — so filters work reliably with Open WebUI but may not trigger with raw API clients.

Can I run Pipelines without Docker?

Yes: git clone https://github.com/open-webui/pipelines && cd pipelines && pip install -r requirements.txt && uvicorn main:app --host 0.0.0.0 --port 9099. Python 3.11 specifically — 3.12 has reported dependency conflicts with some example pipelines as of mid-2026.

Is the default API key 0p3n-w3bu! a security risk?

Only if Pipelines is reachable from outside your machine. On a standard single-host setup with no public port exposure, it’s fine. If you’re behind a router with no port forwarding, you’re safe. If you expose port 9099 externally, change it immediately: -e WEBUI_SECRET_KEY=your-random-string.

How do I pass API keys to pipelines securely?

Use Valves. Each pipeline’s Valves class exposes fields in the Admin Panel UI, which stores values in Open WebUI’s database — not in plain-text config files. Avoid hardcoding secrets directly in pipeline Python files, especially if you share them.

Can a filter pipeline modify the LLM’s response, not just the input?

Yes — implement an outlet(self, body: dict, user: Optional[dict] = None) -> dict method alongside inlet(). The outlet runs after the LLM responds, letting you clean up output, append disclaimers, log responses, or translate the reply.

Sources

Was this article helpful?