May 23, 2026

Flowise Local Setup Guide: Build AI Workflows Without Python

By AIFoss · 10 min read

flowiseainocodeselfhostedllm

Flowise gives you a drag-and-drop interface for building LLM pipelines — RAG chatbots, multi-step agents, document Q&A — without writing a single line of Python. It’s Node.js-based, runs locally on modest hardware, and connects to Ollama so your data never leaves your machine.

If you’ve looked at LangChain or LlamaIndex and thought “this is too much code for what I’m trying to do,” Flowise is the answer. If you’ve looked at n8n and wanted something more AI-native, same answer.

This gets you from zero to a working RAG chatbot on localhost in under 20 minutes.

What Flowise actually is

Flowise is an open-source, self-hostable UI for building LLM applications using a node-based visual editor. Each “node” represents an LLM component — a model, a retriever, a memory store, a tool — and you wire them together on a canvas. The result is a chatflow or agentflow you can embed via iframe, call via API, or just use through the built-in chat interface.

License: Apache 2.0. The code is at FlowiseAI/Flowise on GitHub.

It’s built on top of LangChain.js, which means you get LangChain’s integrations (dozens of LLM providers, vector stores, document loaders) without writing any JavaScript yourself.

What you can build:

RAG chatbots that answer questions about your documents
Multi-agent systems with tool use (web search, code execution, APIs)
Chatbot embeds for internal tools or websites
Structured data extraction pipelines
API-chaining workflows that combine multiple LLM calls

Prerequisites

Requirement	Minimum	Notes
Node.js	18.x or 20.x	v22+ also works; check the repo for current support
RAM	4 GB	8 GB recommended if running models locally alongside
Disk	2 GB free	More if storing vector embeddings locally
OS	Windows, macOS, Linux	All supported; Docker is the easiest cross-platform path

Flowise itself is lightweight. The hardware pressure comes from whatever models you run through it. For Ollama with Llama 3.2 3B: 4 GB RAM is enough. For anything 7B+: 8 GB RAM minimum, GPU optional but helpful.

Installation: npm

The npm method is the fastest way to get started.

npm install -g flowise
npx flowise start

Open http://localhost:3000. That’s the entire install process.

No login required by default — Flowise assumes local single-user mode. To enable authentication:

npx flowise start --FLOWISE_USERNAME=admin --FLOWISE_PASSWORD=yourpassword

To update later: npm update -g flowise.

Node version note: If npx flowise start throws an error about unsupported Node versions, use nvm to switch to Node 20 LTS. This is the most common first-run issue.

Installation: Docker

For a persistent service — or a shared setup where multiple people need access — Docker is cleaner than npm:

docker run -d \
  --name flowise \
  -p 3000:3000 \
  -v ~/.flowise:/root/.flowise \
  flowiseai/flowise

The -v flag mounts a volume so your chatflows and vector data persist across container restarts. Without it, everything resets when the container stops.

The official repo includes Docker Compose examples under the docker/ directory. That’s the right starting point if you’re adding a database (PostgreSQL) or running Flowise alongside other services.

Running Flowise in Docker with Ollama on the host: Point the Ollama base URL to http://host.docker.internal:11434 on Mac or Windows. On Linux, use the host’s actual network IP or configure --network=host.

Connecting to Ollama

Once Flowise is running, connecting a local model takes about 30 seconds:

Confirm Ollama is running: ollama serve (or check that the Ollama app is open)
Pull a model if you haven’t: ollama pull llama3.2 or ollama pull mistral
In Flowise, open a new Chatflow
Drag the ChatOllama node onto the canvas
Set Base URL to http://localhost:11434, select your model name from the dropdown

Wire the ChatOllama node to a Conversation Chain node, open the chat panel (bottom right), and test it. If you get a response — you’re connected. No API key, no rate limits, fully offline.

Building your first RAG chatbot

This is the workflow Flowise is best known for, and it’s genuinely fast to set up once you know which nodes to use.

The node chain:

PDF File (or Text File) loader — upload your document
Recursive Character Text Splitter — chunk size 1000, overlap 200 (sensible defaults)
Ollama Embeddings — model: nomic-embed-text (pull it first: ollama pull nomic-embed-text)
In-Memory Vector Store — fine for testing
Conversational Retrieval QA Chain — this ties together your retriever and LLM, and handles conversation history automatically

Connect left to right: PDF Loader → Text Splitter feeds into Vector Store. Vector Store retriever output feeds into the QA Chain. ChatOllama feeds into the QA Chain as the LLM.

Hit the Upsert button on the vector store node to index your documents. It’s the database icon — easy to miss the first time. After upsert completes, open the chat and ask a question about the document.

The first time it answers a question with content actually drawn from your file — not hallucinated — is a useful moment. You’ve got a working local RAG pipeline.

Persistent vector storage with Chroma

In-memory storage resets every time Flowise restarts. For anything beyond a one-off demo, add Chroma:

docker run -d \
  --name chromadb \
  -p 8000:8000 \
  chromadb/chroma

In Flowise, replace the In-Memory Vector Store node with Chroma, point it to http://localhost:8000, and set a collection name. Your embeddings now persist between sessions — upsert once, query indefinitely.

If you’re running Chroma in Docker alongside Flowise in Docker, make sure both containers are on the same Docker network or use host.docker.internal.

Building an agent (tool use beyond RAG)

Flowise has a separate Agentflow canvas for multi-tool agents. The difference from chatflows: agents can decide which tools to call, not just retrieve and answer.

Useful built-in tools:

Calculator — LLM math is unreliable; offload it
Web Browser — Puppeteer-based live browsing
Custom Tool — point it at any HTTP API you want the agent to call

An agentflow with ChatOllama + a Calculator tool + a web search tool gives you something close to a local ReAct agent. Mistral and Llama 3 handle tool use reasonably well; smaller models (under 7B) tend to struggle with multi-tool decisions.

Flowise vs. the alternatives

Tool	Interface	Language	Local model support	Best for
Flowise	Visual (nodes)	Node.js	Excellent (Ollama native)	RAG + agent prototypes
Langflow	Visual (nodes)	Python	Good	Python-first teams
n8n	Visual (workflow)	Node.js	Via HTTP nodes	General automation + some AI
Dify	Visual + hosted	Python	Good	Teams wanting a managed option
LangChain (code)	Code	Python/JS	Full control	Custom production pipelines

Flowise wins when you want Ollama integration without glue code and you’re prototyping quickly. Langflow is the closest competitor — essentially the same concept but Python-native and with a slightly different node model. n8n is better at non-AI automation; its AI nodes feel bolted on compared to Flowise’s purpose-built design.

For a deeper comparison of all three in production workflow scenarios, see Flowise vs n8n vs LangGraph 2026.

When NOT to use Flowise

You need fine-grained retrieval control. The visual nodes abstract away the retrieval internals. If you need to tune hybrid search weights, implement custom re-ranking, or control exactly how the context window is constructed — write the code. Flowise’s abstraction becomes a ceiling at that point.

High-concurrency production systems. Flowise can run behind a load balancer, but it wasn’t designed to scale horizontally out of the box. If you’re handling thousands of simultaneous users, the architecture needs to be explicit about this from the start.

Your team is already Python-native. If everyone writes LangChain Python, Langflow will feel more natural and your existing tooling (pytest, standard Python libraries, Jupyter) carries over.

You need enterprise auth, audit logs, or SSO. The community edition doesn’t include these. There’s an enterprise offering from FlowiseAI, but that’s outside the scope of a self-hosted setup.

Your RAG pipeline isn’t actually complex. If you just want a chatbot over one document and you’re comfortable with Python, AnythingLLM has a more polished UI for that specific use case and is easier to set up for non-technical users.

Scaling with cloud GPUs

If local hardware isn’t enough for the models you want to test — Mixtral 8x7B, 34B parameter models, or fine-tuned variants — RunPod is the fastest path. Spin up a pod, install Ollama, expose the port, and point Flowise at the public endpoint. You get cloud GPU access without managing infrastructure, and you pay by the hour.

This pattern also works well for batch indexing large document collections where embedding speed matters — run a 4xA100 pod for an hour to index a large corpus, then switch back to local Chroma for queries.

Practical tips

Name your nodes. A chatflow with 15 unnamed nodes is unreadable after a week. Double-click any node header to rename it.

Export chatflows before major changes. Top-right menu → Export. The JSON is human-readable and git-diffable. Treat it like a checkpoint.

Use Document Stores for shared embeddings. Newer versions of Flowise have a Document Store feature that lets you manage indexed documents independently of individual chatflows and reuse them across multiple flows. Much cleaner than embedding document loaders inside every chatflow.

Test with a small chunk of your data first. Before upserting a 500-page PDF, test with 10 pages. Verify the retrieval is returning relevant chunks before indexing the full corpus.

The Flowise API is REST. Every chatflow gets an API endpoint automatically. You can call it from any language with a POST request — useful when you want to integrate a Flowise chatbot into an existing app without iframe embeds.

The verdict

Flowise is one of the fastest paths from “I want to build a chatbot over my documents” to a working prototype without writing application code. The Ollama integration works out of the box, the node canvas makes the RAG data flow easy to reason about, and it runs comfortably on a laptop.

The tradeoff: the visual abstraction is also the ceiling. When your retrieval logic gets complex or you need production-grade reliability, you’ll eventually want code. But Flowise is genuinely useful before you hit that ceiling — and for internal tools, it may be enough permanently.

For the LLM layer that powers Flowise, see the Ollama 2026 review for a full breakdown of what you’re running under the hood.

Install it. Connect it to Ollama. Build a RAG chatflow against a document that actually matters to you. The prototype-to-useful-tool distance is shorter here than almost anywhere else in the local AI stack.

1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Was this article helpful?