May 22, 2026

Cline Setup Guide 2026: VS Code Agent with Local Models

By AIFoss · 14 min read

aicodingproductivityllmopensource

Cline is the VS Code extension that executes code changes autonomously — reads your files, proposes a plan, and with your approval writes the edits, runs commands, inspects errors, and iterates until the task is done. As of May 2026, the VS Code extension is at v3.84.0, the CLI at v3.0.11, it has 62,000+ GitHub stars, and it carries an Apache 2.0 license. Five million installs across the Marketplace and Open VSX.

Unlike Continue.dev, which gives you autocomplete and inline suggestions you apply one at a time, Cline takes a task description and does the work. You’re approving, not steering. That’s a meaningful shift in how you spend time with a codebase — and it works entirely on local models via Ollama, which matters if you’re dealing with a private codebase, working offline, or unwilling to pipe your code to external APIs.

How Cline Works

Two modes structure everything:

Plan mode — Cline reads the relevant files, asks clarifying questions if it needs to, and lays out exactly what it’s going to do before touching anything. You can push back, redirect, or approve. This is where you catch misunderstood requirements before they become multi-file edits to undo.

Act mode — Cline executes. Every file edit, terminal command, and tool call is shown before it runs. You can approve each step individually or configure auto-approval thresholds for low-risk operations like reading files or running test commands.

The approve-everything default feels slow for the first few tasks. After a few cycles, you realize it’s the mechanism that prevents you from shipping a half-executed refactor. Tune auto-approve thresholds in settings once you’ve calibrated what the model does with your codebase.

There’s also YOLO mode — Cline transitions Plan → Act automatically without waiting for your sign-off. Don’t start there. Get a feel for what the agent actually does before letting it run unsupervised on production code.

Cline runs inside VS Code, JetBrains, Cursor, Windsurf, and Zed, plus a preview CLI for macOS and Linux. The VS Code extension is by far the most mature.

Installation

Open VS Code and press Ctrl+Shift+X (macOS: Cmd+Shift+X). Search for Cline. Publisher is saoudrizwan. Install.

After installation, the Cline icon appears in the left sidebar. Click it and the chat panel opens. No Python environment, no Node version management on your end — the extension bundles what it needs.

Quick Start with a Cloud Model

If you want to validate Cline’s behavior before setting up local inference, the fastest path is through Anthropic’s API:

Open Cline settings (gear icon in the Cline panel)
Set API Provider to Anthropic
Paste your API key
Select a current Sonnet model from the dropdown

Give it a scoped task: “Add a --dry-run flag to the CLI entry point that prints what would happen without executing.” Watch Plan mode describe the approach, then Act mode carry it out across files.

Running this once with a frontier model gives you a baseline for what “a correct Cline task execution” looks like. That baseline is useful when you’re later comparing it against local models and trying to diagnose whether a bad result is model quality or task scope.

Local Models via Ollama

Cline talks to Ollama’s local HTTP API the same way it talks to any OpenAI-compatible endpoint. The configuration is minimal.

Step 1: Install Ollama

If Ollama isn’t already running, the Ollama 2026 review covers the full setup. The short path:

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS (Homebrew)
brew install ollama

Start the server:

ollama serve

Ollama binds to http://localhost:11434 by default.

Step 2: Pull a Coding Model

# Fast and capable on modest hardware (~5GB disk, 8GB+ VRAM)
ollama pull qwen2.5-coder:7b

# Strong multi-file reasoning — the default pick for 16-24GB setups
ollama pull qwen2.5-coder:32b

# Reasoning-focused; strong on complex bugs but 2-3x slower
ollama pull deepseek-r1:14b

Step 3: Configure Cline

In the Cline settings panel:

API Provider → Ollama
Base URL → http://localhost:11434
Model → select the model you pulled (e.g., qwen2.5-coder:32b)

The model list auto-populates from whatever Ollama has downloaded. Save. That’s the full local setup.

Which Local Model to Run

The right answer depends on your hardware and task complexity. Here’s where each model fits:

Model	VRAM / RAM needed	Best for	Weakness
qwen2.5-coder:7b	8GB VRAM / 16GB RAM	Boilerplate, scaffolding, quick refactors	Weak multi-file reasoning
qwen2.5-coder:32b	22GB VRAM / 32GB RAM	Multi-file edits, reliable tool calls	Needs serious hardware
deepseek-r1:14b	10GB VRAM / 24GB RAM	Complex debugging, step-by-step reasoning	2–3x slower than Qwen2.5
Qwen3-Coder 30B	20GB VRAM / 32GB RAM	Agentic workflows, 256K context, best tool use	Large; requires high-end hardware
llama4:scout	16GB VRAM / 32GB RAM	Balanced general coding, multimodal tasks	Less community-tested for agents

For most developers with a 16–24GB GPU: qwen2.5-coder:32b is the default pick. It handles multi-file edits without hallucinating tool calls — which matters more for Cline than raw benchmark scores, because agentic use requires the model to reliably call read_file and write_to_file in the right sequence.

On Apple Silicon (M3 Pro / M3 Max or newer), qwen2.5-coder:32b at Q4_K_M runs at 20–30 tokens/sec thanks to unified memory. On NVIDIA, you need roughly 22GB VRAM for the 32B variant at Q4_K_M quantization.

If your machine has an 8GB GPU, use qwen2.5-coder:7b for contained tasks and switch to a cloud provider (OpenRouter, Anthropic) for anything requiring coherent reasoning across a large file tree.

Qwen3-Coder 30B is worth trying if you have the hardware. It was tuned specifically for agentic workflows — it understands tool-use sequences rather than just text generation, which directly benefits how Cline chains file reads and writes.

Configuring Cline for Your Codebase

This is where Cline’s setup diverges from most coding tools. The .clinerules/ directory in your project root is the mechanism for giving Cline persistent, project-scoped instructions. Think of it as version-controlled system prompt — but editable by the agent itself.

Create .clinerules/project-rules.md:

# Project Rules

## Tech Stack
- TypeScript, Node.js 22, Vitest for tests
- All database access through the `db/` module — never raw SQL elsewhere
- Prefer functional patterns; avoid class hierarchies unless necessary

## Style
- camelCase variables, PascalCase types/interfaces, UPPER_SNAKE_CASE constants
- No `any` types without a comment explaining why

## Testing
- All new functions need a unit test in the co-located `__tests__/` directory
- Run `pnpm test` before declaring a task complete

## Branching
- Never commit directly to main
- Branch name format: `feat/short-description` or `fix/short-description`

Keep each rule file under 150 lines. For separate concerns — architecture, style, testing, deployment — split into separate files: 01-architecture.md, 02-style.md, and so on. Cline processes all .md and .txt files in .clinerules/ and merges them into a single context block.

Workspace rules take precedence over global rules when they conflict. Global rules (in your VS Code settings directory) work well for personal preferences you want across every project — indentation style, tool call verbosity, things like that.

The genuinely useful part: Cline can edit these rule files itself. Tell it “refine the testing rule to require integration tests for any new API endpoint” and it will update the rule file. Your coding standards become living documents rather than a README section nobody reads after the first week.

Commit .clinerules/ to the repo. Every team member running Cline gets the same behavior, and rule changes are visible in pull requests like any other code change.

MCP Server Integration

Cline supports the Model Context Protocol, and its in-extension MCP Marketplace makes adding servers a one-click operation. Practical ones to enable early:

File system MCP — broader read access without manually adding @ file references to every prompt.

GitHub MCP — Cline can read issues, PRs, and repo metadata as context. Useful for tasks like “implement what issue #247 describes.”

Database MCP (Postgres or SQLite) — lets Cline reason about schema while editing data-layer code, rather than guessing table structure from filenames.

MCP config lives in cline_mcp_settings.json in your VS Code settings directory. If you’re also using Claude Code or Cursor, you can keep a single .mcp.json in your project root and symlink it — the config format is compatible across tools.

When Cline Falls Short

Terminal-first developers. If you live in the terminal and your workflow revolves around Git commits, Aider is the stronger fit (see the Aider review for a full capability breakdown). Aider runs linters and tests automatically after every change, its Git integration is tighter than any other tool in this space, and it doesn’t require VS Code. Cline is a VS Code tool first.

Incremental suggestion workflows. Cline wants to complete tasks end-to-end. If you want a coding assistant that offers snippets and lets you decide whether to apply each line, Continue.dev is more appropriate. With a weak local model on a complex task, Cline can generate a lot of changes quickly — which is useful only if you trust the model enough to review diffs at a high level rather than audit every line.

Underpowered hardware with large codebases. A 7B model doing multi-file refactors in a 100k-line codebase will lose context and produce incomplete edits. Cline’s context window handling is solid, but it can’t compensate for a model that doesn’t understand the files it’s reading. Either use a 32B model, switch to a cloud provider for complex tasks, or keep tasks narrowly scoped.

Strict execution audit requirements. Cline runs terminal commands. In environments where you need a full audit trail of every shell command or can’t have an agent writing arbitrary files, Cline’s autonomy is the problem rather than the solution. Use a more controlled completion tool in those contexts.

For a head-to-head breakdown of how Cline compares against Continue.dev and Aider on local models, including benchmark data and workflow differences, see the Continue.dev vs Cline vs Aider 2026 comparison. If a self-hosted team server is also in the mix, Tabby vs Continue.dev vs Cline covers where each one fits.

Cline vs. Claude Code: Which Agentic Tool Should You Use?

The most direct competitor to Cline in 2026 is Claude Code — Anthropic’s own terminal-native coding agent. Both are agentic (they read files, write files, run commands, and iterate), and both can be connected to the same underlying models. The differences are in architecture, openness, and workflow fit.

	Cline	Claude Code
License	Apache 2.0 (free, open source)	Subscription required
Interface	VS Code sidebar (+ JetBrains, Zed, CLI)	Terminal (standalone CLI)
Model support	Any LLM: Anthropic, OpenAI, Ollama, OpenRouter, 30+ providers	Claude models only
Speed	Depends on model + approval overhead	~3× more edits/minute (optimized prompts + diff handling)
Human-in-the-loop	Every action shown and approved by default	Configurable; fewer prompts by default
Local model support	Yes — Ollama, LM Studio, any OpenAI-compatible endpoint	No
SWE-bench score (Opus 4.6)	Equivalent model quality	80.8% Verified

The 3× speed advantage Claude Code shows in benchmarks comes from purpose-built prompts and diff-formatting optimizations, not from model capability — a Cline session running Claude Opus 4.7 reaches the same raw model quality. The tooling layer is what’s faster.

Choose Cline if: you want open-source flexibility, need local model support for a private codebase, work across multiple IDEs, or don’t want to pay a subscription on top of API costs. The Apache 2.0 license means you can self-host, fork, and inspect everything.

Choose Claude Code if: you value speed and a tighter agent loop, work exclusively in the terminal, and are already paying for an Anthropic API plan. Claude Code’s automatic context compaction and parallel sub-agent support (Agent Teams) have no direct Cline equivalent yet.

Many teams run both: Cline with a local model for internal/private code, and Claude Code for greenfield features where speed matters more than model confidentiality.

RunPod as a Backend Alternative

If your local hardware can’t run a 32B model but you’d rather not pay per-token to a commercial API, RunPod lets you spin up an Ollama-compatible endpoint on rented GPU hardware. Set Cline’s Ollama base URL to the RunPod instance and the configuration is identical to local Ollama — the model runs on their hardware, you get the same Ollama API. At high volumes, the per-hour GPU rental rate can work out cheaper than per-token API pricing. For guidance on local vs. cloud GPU tradeoffs, the team at runaihome.com maintains a GPU rental comparison guide that covers RunPod alongside alternatives.

FAQ

Q: Is Cline actually free, or do I pay somewhere?

A: The Cline extension itself is 100% free and open source under Apache 2.0. You pay the model provider directly — Anthropic for Claude, OpenAI for GPT, or nothing if you run a local model via Ollama. There is no Cline subscription. API costs are the variable: a Claude Sonnet session that generates 500 output tokens costs roughly $0.0075 at current Anthropic rates. Heavy agentic sessions across large codebases can accumulate. For unlimited local inference, Ollama with qwen2.5-coder:32b costs only the electricity on your GPU. See the hardware requirements for running 32B models at runaihome.com.

Q: Can Cline work completely offline with no internet connection?

A: Yes, with a local Ollama setup. Pull a model with ollama pull qwen2.5-coder:7b (or 32b if your hardware supports it), point Cline’s API Provider to Ollama at http://localhost:11434, and Cline operates fully offline. No data leaves your machine — inference runs on your GPU. The one limitation is that .clinerules/ files referencing external documentation URLs won’t resolve offline, but the rule text itself works fine. This is the standard configuration for air-gapped environments, sensitive codebases, and privacy-first teams.

Q: What MCP servers give Cline the most practical value?

A: Three categories stand out from real-world use:

GitHub MCP: Lets Cline read issues and PR context inline — useful for prompts like “implement what issue #247 describes” without copying and pasting the issue body yourself. Config in cline_mcp_settings.json.
Database MCP (Postgres or SQLite): Cline can inspect your schema while editing data-layer code. Without this, it often guesses column names from filenames. With it, it writes correct queries against your actual tables.
Filesystem MCP: Extends read access beyond files you’ve manually referenced with @, which reduces how often you need to manually scope context for large projects.

MCP servers can be installed from Cline’s in-extension Marketplace (filter by stars or category) and configured without leaving VS Code. The config format is compatible with Claude Code and Cursor if you keep a shared .mcp.json in the project root.

1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Sources

Cline GitHub Repository — Apache 2.0 license, version history, star count
Cline Releases Page — CLI v3.0.11 and VS Code v3.84.0 (May 2026)
Cline Local Models / Ollama Documentation — Ollama integration setup steps
Cline Plan & Act Mode Documentation — Plan/Act/YOLO mode behavior
Cline Rules Documentation — .clinerules directory setup and best practices
Best Ollama Models for Coding Agents 2026 — model rankings, hardware requirements, Qwen3-Coder context size
Cline MCP Servers Setup Guide 2026 — MCP Marketplace and server configuration

Was this article helpful?