Flowise Local Setup Guide: Build AI Workflows Without Python
Flowise gives you a drag-and-drop interface for building LLM pipelines — RAG chatbots, multi-step agents, document Q&A — without writing a single line of Python. It’s Node.js-based, runs locally on modest hardware, and connects to Ollama so your data never leaves your machine.
If you’ve looked at LangChain or LlamaIndex and thought “this is too much code for what I’m trying to do,” Flowise is the answer. If you’ve looked at n8n and wanted something more AI-native, same answer.
This gets you from zero to a working RAG chatbot on localhost in under 20 minutes.
What Flowise actually is
Flowise is an open-source, self-hostable UI for building LLM applications using a node-based visual editor. Each “node” represents an LLM component — a model, a retriever, a memory store, a tool — and you wire them together on a canvas. The result is a chatflow or agentflow you can embed via iframe, call via API, or just use through the built-in chat interface.
License: Apache 2.0. The code is at FlowiseAI/Flowise on GitHub.
It’s built on top of LangChain.js, which means you get LangChain’s integrations (dozens of LLM providers, vector stores, document loaders) without writing any JavaScript yourself.
What you can build:
- RAG chatbots that answer questions about your documents
- Multi-agent systems with tool use (web search, code execution, APIs)
- Chatbot embeds for internal tools or websites
- Structured data extraction pipelines
- API-chaining workflows that combine multiple LLM calls
Prerequisites
| Requirement | Minimum | Notes |
|---|---|---|
| Node.js | 18.x or 20.x | v22+ also works; check the repo for current support |
| RAM | 4 GB | 8 GB recommended if running models locally alongside |
| Disk | 2 GB free | More if storing vector embeddings locally |
| OS | Windows, macOS, Linux | All supported; Docker is the easiest cross-platform path |
Flowise itself is lightweight. The hardware pressure comes from whatever models you run through it. For Ollama with Llama 3.2 3B: 4 GB RAM is enough. For anything 7B+: 8 GB RAM minimum, GPU optional but helpful.
Installation: npm
The npm method is the fastest way to get started.
npm install -g flowise
npx flowise start
Open http://localhost:3000. That’s the entire install process.
No login required by default — Flowise assumes local single-user mode. To enable authentication:
npx flowise start --FLOWISE_USERNAME=admin --FLOWISE_PASSWORD=yourpassword
To update later: npm update -g flowise.
Node version note: If npx flowise start throws an error about unsupported Node versions, use nvm to switch to Node 20 LTS. This is the most common first-run issue.
Installation: Docker
For a persistent service — or a shared setup where multiple people need access — Docker is cleaner than npm:
docker run -d \
--name flowise \
-p 3000:3000 \
-v ~/.flowise:/root/.flowise \
flowiseai/flowise
The -v flag mounts a volume so your chatflows and vector data persist across container restarts. Without it, everything resets when the container stops.
The official repo includes Docker Compose examples under the docker/ directory. That’s the right starting point if you’re adding a database (PostgreSQL) or running Flowise alongside other services.
Running Flowise in Docker with Ollama on the host: Point the Ollama base URL to http://host.docker.internal:11434 on Mac or Windows. On Linux, use the host’s actual network IP or configure --network=host.
Connecting to Ollama
Once Flowise is running, connecting a local model takes about 30 seconds:
- Confirm Ollama is running:
ollama serve(or check that the Ollama app is open) - Pull a model if you haven’t:
ollama pull llama3.2orollama pull mistral - In Flowise, open a new Chatflow
- Drag the ChatOllama node onto the canvas
- Set Base URL to
http://localhost:11434, select your model name from the dropdown
Wire the ChatOllama node to a Conversation Chain node, open the chat panel (bottom right), and test it. If you get a response — you’re connected. No API key, no rate limits, fully offline.
Building your first RAG chatbot
This is the workflow Flowise is best known for, and it’s genuinely fast to set up once you know which nodes to use.
The node chain:
- PDF File (or Text File) loader — upload your document
- Recursive Character Text Splitter — chunk size 1000, overlap 200 (sensible defaults)
- Ollama Embeddings — model:
nomic-embed-text(pull it first:ollama pull nomic-embed-text) - In-Memory Vector Store — fine for testing
- Conversational Retrieval QA Chain — this ties together your retriever and LLM, and handles conversation history automatically
Connect left to right: PDF Loader → Text Splitter feeds into Vector Store. Vector Store retriever output feeds into the QA Chain. ChatOllama feeds into the QA Chain as the LLM.
Hit the Upsert button on the vector store node to index your documents. It’s the database icon — easy to miss the first time. After upsert completes, open the chat and ask a question about the document.
The first time it answers a question with content actually drawn from your file — not hallucinated — is a useful moment. You’ve got a working local RAG pipeline.
Persistent vector storage with Chroma
In-memory storage resets every time Flowise restarts. For anything beyond a one-off demo, add Chroma:
docker run -d \
--name chromadb \
-p 8000:8000 \
chromadb/chroma
In Flowise, replace the In-Memory Vector Store node with Chroma, point it to http://localhost:8000, and set a collection name. Your embeddings now persist between sessions — upsert once, query indefinitely.
If you’re running Chroma in Docker alongside Flowise in Docker, make sure both containers are on the same Docker network or use host.docker.internal.
Building an agent (tool use beyond RAG)
Flowise has a separate Agentflow canvas for multi-tool agents. The difference from chatflows: agents can decide which tools to call, not just retrieve and answer.
Useful built-in tools:
- Calculator — LLM math is unreliable; offload it
- Web Browser — Puppeteer-based live browsing
- Custom Tool — point it at any HTTP API you want the agent to call
An agentflow with ChatOllama + a Calculator tool + a web search tool gives you something close to a local ReAct agent. Mistral and Llama 3 handle tool use reasonably well; smaller models (under 7B) tend to struggle with multi-tool decisions.
Flowise vs. the alternatives
| Tool | Interface | Language | Local model support | Best for |
|---|---|---|---|---|
| Flowise | Visual (nodes) | Node.js | Excellent (Ollama native) | RAG + agent prototypes |
| Langflow | Visual (nodes) | Python | Good | Python-first teams |
| n8n | Visual (workflow) | Node.js | Via HTTP nodes | General automation + some AI |
| Dify | Visual + hosted | Python | Good | Teams wanting a managed option |
| LangChain (code) | Code | Python/JS | Full control | Custom production pipelines |
Flowise wins when you want Ollama integration without glue code and you’re prototyping quickly. Langflow is the closest competitor — essentially the same concept but Python-native and with a slightly different node model. n8n is better at non-AI automation; its AI nodes feel bolted on compared to Flowise’s purpose-built design.
For a deeper comparison of all three in production workflow scenarios, see Flowise vs n8n vs LangGraph 2026.
When NOT to use Flowise
You need fine-grained retrieval control. The visual nodes abstract away the retrieval internals. If you need to tune hybrid search weights, implement custom re-ranking, or control exactly how the context window is constructed — write the code. Flowise’s abstraction becomes a ceiling at that point.
High-concurrency production systems. Flowise can run behind a load balancer, but it wasn’t designed to scale horizontally out of the box. If you’re handling thousands of simultaneous users, the architecture needs to be explicit about this from the start.
Your team is already Python-native. If everyone writes LangChain Python, Langflow will feel more natural and your existing tooling (pytest, standard Python libraries, Jupyter) carries over.
You need enterprise auth, audit logs, or SSO. The community edition doesn’t include these. There’s an enterprise offering from FlowiseAI, but that’s outside the scope of a self-hosted setup.
Your RAG pipeline isn’t actually complex. If you just want a chatbot over one document and you’re comfortable with Python, AnythingLLM has a more polished UI for that specific use case and is easier to set up for non-technical users.
Scaling with cloud GPUs
If local hardware isn’t enough for the models you want to test — Mixtral 8x7B, 34B parameter models, or fine-tuned variants — RunPod is the fastest path. Spin up a pod, install Ollama, expose the port, and point Flowise at the public endpoint. You get cloud GPU access without managing infrastructure, and you pay by the hour.
This pattern also works well for batch indexing large document collections where embedding speed matters — run a 4xA100 pod for an hour to index a large corpus, then switch back to local Chroma for queries.
Practical tips
Name your nodes. A chatflow with 15 unnamed nodes is unreadable after a week. Double-click any node header to rename it.
Export chatflows before major changes. Top-right menu → Export. The JSON is human-readable and git-diffable. Treat it like a checkpoint.
Use Document Stores for shared embeddings. Newer versions of Flowise have a Document Store feature that lets you manage indexed documents independently of individual chatflows and reuse them across multiple flows. Much cleaner than embedding document loaders inside every chatflow.
Test with a small chunk of your data first. Before upserting a 500-page PDF, test with 10 pages. Verify the retrieval is returning relevant chunks before indexing the full corpus.
The Flowise API is REST. Every chatflow gets an API endpoint automatically. You can call it from any language with a POST request — useful when you want to integrate a Flowise chatbot into an existing app without iframe embeds.
The verdict
Flowise is one of the fastest paths from “I want to build a chatbot over my documents” to a working prototype without writing application code. The Ollama integration works out of the box, the node canvas makes the RAG data flow easy to reason about, and it runs comfortably on a laptop.
The tradeoff: the visual abstraction is also the ceiling. When your retrieval logic gets complex or you need production-grade reliability, you’ll eventually want code. But Flowise is genuinely useful before you hit that ceiling — and for internal tools, it may be enough permanently.
For the LLM layer that powers Flowise, see the Ollama 2026 review for a full breakdown of what you’re running under the hood.
Install it. Connect it to Ollama. Build a RAG chatflow against a document that actually matters to you. The prototype-to-useful-tool distance is shorter here than almost anywhere else in the local AI stack.
1V1 PLAYBOOK · LOCAL LLM
Cut your local AI bill from $400/month cloud GPU to $47/month at home.
4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.
Get it for $19 (early bird) →Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →