Jun 6, 2026

Open-Source Coding Agents 2026: Which One to Run

By AIFoss · 19 min read

aiderclineopenhandscoding-agentsopen-sourceaiselfhosted

TL;DR: Coding agents became a real product category in 2025–2026. Cline leads for VS Code users with cloud API access; Aider is the best terminal agent for local models at 32B+; OpenHands handles autonomous issue-fixing at 72% SWE-bench Verified. Using any of these is a cost and privacy play — commercial agents (Cursor, Claude Code) still outperform on pure capability.

	Aider	Cline	OpenHands
Best for	CLI pair-programming, git-first workflow	VS Code multi-file editing	Autonomous issue-fixing, CI pipelines
Local model support	Any OpenAI-compatible endpoint	Ollama, LM Studio, 30+ providers	OpenAI-compatible via config
The catch	Output quality floors at ≥32B local models	Token costs climb fast on large repos	Heavier setup; Docker required

Honest take: If you’re a VS Code developer with a cloud API key, start with Cline. If you need tight git history and local model flexibility, use Aider. If you want an agent that fixes GitHub issues and opens a PR without you watching, OpenHands with Claude Opus 4.6 is currently the open-source best.

Coding agent vs. code completion vs. LLM chat

These three categories blur in marketing but matter for real usage:

Code completion (GitHub Copilot, Tabby, Continue.dev): autocompletes as you type. Zero autonomy. The model never touches your filesystem without you copying its output.

LLM chat (Open WebUI, LibreChat): you paste code, it suggests. You decide whether to apply the suggestion. The model has no access to your repo.

Coding agent: reads your repo, creates and modifies files, runs terminal commands, interprets errors, and loops until the task is done or it gives up. It has real side effects. It can delete a file you meant to keep.

That last point matters for security, for choosing the right tool, and for setting expectations with teammates. An agent running bash commands in your home directory is not the same risk category as an inline autocomplete.

The six agents covered here

The brief for this article called for five (Aider, Cline, Open Interpreter, SWE-agent, Plandex). OpenHands earns a spot because it’s currently the best-performing open-source autonomous agent on the SWE-bench Verified leaderboard and raised a $18.8M Series A in June 2026.

Aider

License: Apache 2.0
GitHub stars: ~39K (June 2026)
Interface: terminal (CLI)
Editing approach: diff/patch — sends minimal diffs to the model, not the full file every round

Aider’s design is git-first: every AI edit becomes a commit. You run aider from your repo root, describe the task, and it generates a focused commit. If the output is wrong, git diff HEAD~1 shows exactly what changed and reverting is a single command.

It supports 75+ model providers via litellm. Routing to a local Ollama endpoint is a single flag:

pip install aider-install && aider-install

# local model
aider --model ollama/qwen2.5-coder:32b --no-auto-commits

# architect mode: separate planner + editor for harder tasks
aider --architect \
  --model claude-opus-4-20250514 \
  --editor-model claude-sonnet-4-20250514

The tradeoff: Aider is a pair programmer, not an autonomous agent. Its SWE-bench Verified score in architect mode is 31.4% — lower than Cline or OpenHands in full-autonomy mode. That’s an architecture choice, not a flaw. Aider assumes you’re watching and guiding. The diff-patch approach also sends significantly fewer tokens per session than tools that pass full file contents each turn, which cuts cloud API costs.

Cline

License: Apache 2.0
GitHub stars: ~61K (June 2026)
Version: v3.81
Interface: VS Code sidebar (also JetBrains, Cursor, Windsurf, Zed, CLI preview)
Editing approach: full-file rewrites with a diff view before applying

Cline reads your codebase, creates/edits files, runs terminal commands, drives a Puppeteer browser for web tasks, and pauses for approval at each consequential step. Provider support includes Ollama and LM Studio in the dropdown — no manual config files required.

SWE-bench Verified score: ~59.8% running Claude Sonnet 4.6 in autonomous mode. That’s comfortably ahead of Aider’s autonomous score and competitive with proprietary agents in the $20/month tier.

The token cost problem deserves a real callout: Cline passes the full file context to the model on most edits. A single complex task on a large repo can consume 500K+ input tokens. With Claude Sonnet 4.6 at $3/MTok input, that’s $1.50 per task. Do ten tasks a day and it adds up to $450/month — more than a Cursor subscription. Use a local model for exploration and a cloud model for final implementation, or set Cline’s context limits to cap per-session spending.

# Cline with a local Ollama model
# VS Code: Settings > Cline > API Provider > Ollama
# Model: qwen2.5-coder:32b
# Base URL: http://localhost:11434

See the Cline setup guide for full configuration including API keys, context settings, and VS Code workspace options.

OpenHands

License: MIT
GitHub stars: ~70K (June 2026)
Interface: web UI + CLI + REST API
Editing approach: CodeAct — generates executable Python to modify files, runs it, observes output, loops

OpenHands (formerly OpenDevin) is the most capable open-source autonomous agent currently available. It achieves 72% on SWE-bench Verified running Claude Opus 4.6 — second only to the commercial top-tier agents in published benchmark results. It raised a $18.8M Series A in June 2026 and has enterprise roadmap items including GitHub App integration and team workspaces.

Setup requires Docker:

git clone https://github.com/All-Hands-AI/OpenHands.git
cd OpenHands
cp config.template.toml config.toml
# edit config.toml: add your LLM API key and model name
docker compose up
# open http://localhost:3000

Local model support works via any OpenAI-compatible endpoint configured in config.toml. The quality drop is steep: at 7B parameters you’ll mostly see failures; at 32B you can handle well-scoped tasks. For autonomous issue-fixing that reliably produces PR-quality output, use a cloud model.

The friction is real: Docker Compose, a config file, and a running server before you get started. That’s more overhead than pip install aider. It’s worth it when the task is “fix this GitHub issue” — not when the task is “help me understand this function.”

SWE-agent and mini-SWE-agent

License: MIT
Interface: CLI / Python API
Benchmark: mini-SWE-agent scores >74% on SWE-bench Verified

SWE-agent is Princeton’s NeurIPS 2024 paper that started the benchmark arms race — a system that takes a GitHub issue URL and tries to fix it with an LM agent. The team’s current development focus has shifted to mini-SWE-agent: a ~100-line Python script that scores above 74% on SWE-bench Verified and is dramatically simpler to read and extend.

Be clear about what this is: research infrastructure, not a daily driver. There’s no IDE extension, no streaming output, no UX beyond a CLI. It’s valuable for benchmarking your own models, for understanding how agent scaffolding works, and for running automated issue-fixing pipelines. For actual development work, Aider or Cline is a better fit.

pip install mini-swe-agent

# fix a GitHub issue
mini-swe-agent run \
  --issue "https://github.com/your-org/repo/issues/42" \
  --model claude-opus-4-20250514

Plandex

License: AGPL-3.0
Version: CLI v2.2.1 (July 2025; check github.com/plandex-ai/plandex for current)
Interface: terminal
Editing approach: streamed changes applied to a sandbox, staged for review before touching your working directory

Plandex targets large multi-file projects. It handles up to 2M context tokens directly, can index 20M+ token codebases via tree-sitter project maps, and supports 400+ models via OpenRouter plus any local OpenAI-compatible endpoint. The commercial Plandex Cloud shut down in October 2025 — you self-host the server via Docker.

# install CLI
curl -sL https://plandex.ai/install.sh | bash

# start self-hosted server
docker run -p 8099:8099 plandexai/plandex-server:latest

export PLANDEX_API_URL=http://localhost:8099
plandex new

AGPL-3.0 license note: if you distribute software that incorporates Plandex as a component (versus using it as a standalone tool), the AGPL copyleft terms apply. Fine for internal use; worth checking with your legal team if you’re building a product on top of it.

Open Interpreter

License: AGPL-3.0
GitHub stars: ~57K (June 2026)
Interface: terminal + Python API
Editing approach: generates code and executes it directly in your environment

Open Interpreter is the outlier in this group. It’s not primarily a code editor — it’s a natural-language shell. Tell it “resize all images in this folder to 800px wide” and it writes and runs the Python to do it. Tell it “open a browser and extract all prices from this page into a CSV” and it does that too.

Local model support works via interpreter --local, which prompts you to choose Ollama, LM Studio, or Llamafile. The local experience is rougher than the other tools — small models frequently generate broken code that loops. Practical minimum is 14B; 32B is where it becomes reliable for multi-step OS tasks.

pip install open-interpreter
interpreter --local            # walks through local model selection
interpreter "convert all .wav files in /music to .mp3 using ffmpeg"

The right use case: OS-level automation that goes beyond editing source files. For pure coding tasks (refactor this function, fix this bug), Aider or Cline produces better output with less scaffolding.

Also see the Open Interpreter Review and the Open Interpreter vs Aider vs Claude Code comparison for deeper coverage.

What about OpenCode and Goose?

Two agents that weren’t in the original brief have grown too big to ignore since this article’s June publication.

OpenCode (MIT) is now the most-starred coding agent on GitHub — roughly 183K stars as of July 2026 — and it’s the closest open-source answer to Claude Code specifically. Maintained by Anomaly Co (the team formerly behind SST) as a client/server system rather than a single binary, one backend drives a terminal TUI, a desktop app, and VS Code/Cursor extensions. It connects to 75+ providers including local Ollama, and you can switch providers mid-session. If the terminal-agent workflow appeals but Aider’s pair-programming model feels too manual, OpenCode is the one to try first; our OpenCode + Ollama setup guide covers the local-model config, including the tool-calling-model requirement that trips up most first runs.

Goose (Apache 2.0) started at Block and was donated to the Linux Foundation’s new Agentic AI Foundation in April 2026 — the same body that now stewards Anthropic’s Model Context Protocol. It’s an MCP-native agent framework rather than a pure coding tool: extensions give it filesystem, shell, and API access, and it runs fully local against Ollama. The setup trap and the fix are in our Goose + Ollama self-hosted guide.

Neither publishes an official SWE-bench Verified score comparable to the table below, which is why they get this sidebar instead of a full entry — but for terminal-first developers, OpenCode in particular now belongs on the same shortlist as Aider.

Comparison matrix

	Aider	Cline	OpenHands	mini-SWE-agent	Plandex	Open Interpreter
SWE-bench Verified	31.4% (architect mode)	~59.8% (Claude S 4.6)	72% (Claude O 4.6)	>74%	not published	not applicable
Local model support	✅ any OAI-compatible	✅ Ollama, LM Studio	✅ OAI-compatible	✅ OAI-compatible	✅ OpenRouter + local	✅ Ollama, LM Studio
IDE integration	CLI only	VS Code, JetBrains, Cursor, Zed	Web UI + REST	CLI only	CLI only	CLI + Python API
Editing approach	Diff/patch	Full-file + diff review	CodeAct execution loop	Targeted file edits	Streamed, sandboxed	Direct code execution
Min useful local model	32B (Qwen2.5-Coder)	32B (Qwen2.5-Coder)	32B for simple tasks	32B	14B+	14B+
License	Apache 2.0	Apache 2.0	MIT	MIT	AGPL-3.0	AGPL-3.0
Est. cloud cost per session	$0.05–$1.50	$0.50–$5	$1–$10	$1–$8	$0.50–$5	$0.10–$2
GitHub stars (June 2026)	~39K	~61K	~70K	part of SWE-agent repo	~5K	~57K

Local model reality

All six tools advertise local model support. The honest picture by model size:

7B parameters: every tool connects without errors but generates code you’ll rewrite. Useful for explaining a function or generating obvious boilerplate. Not useful for debugging or multi-step tasks.

14B–20B parameters: Aider becomes genuinely useful for focused single-file changes if instructions are specific. Open Interpreter handles one-shot shell commands reliably. Cline can complete simple refactors with detailed prompting.

32B parameters (Qwen2.5-Coder 32B Q4, Devstral Small 2 24B, DeepSeek Coder V2): the gap to cloud models narrows enough for real daily work. Aider’s diff-patch architecture is especially efficient here — it sends 2K–5K tokens per round-trip versus Cline’s 15K–50K on the same task. That efficiency matters with the limited context windows of quantized 32B models.

70B+ parameters via GGUF on 48GB+ VRAM: output quality approaches cloud API levels for most coding tasks. An RTX 4090 with 24GB VRAM runs 32B Q4 well and 70B Q2 at reduced quality. For a model-by-model hardware breakdown, see runaihome.com’s guide to the best local coding LLMs and what they need to run. For burst capacity without the upfront GPU purchase, RunPod A100 instances run around $1.50–$2/hour.

The honest constraint: none of these tools produce reliable multi-file autonomous output with a local model under 32B. If you want autonomous PR-quality work, you’re either running a cloud API or a serious GPU setup.

SWE-bench: what the score tells you (and what it doesn’t)

SWE-bench Verified measures whether an agent can fix a real GitHub issue on a real codebase. 500 human-validated tasks, scored pass/fail. It’s the most credible coding benchmark published so far.

But OpenAI retired SWE-bench Verified as their frontier evaluation in 2026 because scores are climbing faster than real-world capability — the tasks are public, solutions exist on GitHub, and models trained on that data score higher than their actual programming ability. SWE-bench Pro (1,865 multi-language tasks with less public exposure) is the emerging replacement.

Practical interpretation of the current scores:

Under 40%: useful pair programmer, won’t ship autonomous PR-quality code.
40–65%: handles well-scoped bug fixes autonomously, still needs review.
Over 65%: handles moderate complexity tasks autonomously with reasonable reliability.

None of the open-source tools covered here cross 80% on SWE-bench Verified. Commercial top-tier does. The gap is real.

Also important: Aider’s 31.4% is not a fair comparison to OpenHands’ 72% for interactive use. Aider’s benchmark score reflects its autonomous mode, which is not how most developers use it. Aider in interactive mode — where a developer reviews each change before committing — produces output quality that users consistently rate much higher than that number implies.

Cost breakdown: cloud API vs. local model

Assuming 10 meaningful coding sessions per day, 22 working days per month:

Setup	Monthly cost estimate
Cline + Claude Sonnet 4.6 (heavy use, full-file context)	$150–$350
Cline + Claude Sonnet 4.6 (light use, small files)	$30–$80
Aider + Claude Sonnet 4.6 (diff mode, same workload)	$15–$50
Aider + local Qwen2.5-Coder 32B Q4 (own GPU)	~$0 + electricity
Aider + local model on RunPod A100 (8 hrs/day)	~$35/month
OpenHands + Claude Opus 4.6 (10 issues/day)	$80–$200
Cursor Pro	$20/month flat
Claude Code subscription	$100/month flat

Aider’s diff-patch architecture sends 5–10× fewer tokens per session than Cline’s full-file approach on the same codebase. If you’re price-sensitive and already using a cloud model, switching from Cline to Aider for large-repo work can cut your LLM spend significantly without changing model quality.

When NOT to use open-source coding agents

Large multi-file refactors without a 32B+ local model: agent output quality below that threshold means you’ll spend more time fixing than you would have spent writing.

Security-sensitive codebases without review gates: every tool here can run shell commands or execute generated code. Read what the agent is about to run before you approve it. AGPL tools (Plandex, Open Interpreter) need legal review before you embed them in a product.

When you need accurate test coverage: these agents write tests. Whether the tests correctly verify behavior is a different question. Human review of generated test logic is not optional.

Production CI without a container sandbox: running an autonomous agent that can modify production files in an uncontained CI environment is a meaningful risk. Use OpenHands with Docker isolation or a dedicated sandbox VM.

When your local model is under 14B: the ratio of useful output to debugging agent errors is negative. Use a cloud API or don’t use an agent.

The elephant in the room: cloud agents still lead

Commercial agents outperform all of these on benchmarks and on day-to-day output:

Cursor Background Agent: 65%+ SWE-bench Verified, tight VS Code UX, $20/month flat.
Claude Code: ~80.9% SWE-bench Verified, first-class git and GitHub integration, available as a CLI.
Devin: enterprise pricing, but consistently above 70% on uncontaminated benchmarks with less hand-holding required.

The honest summary: running an open-source coding agent is a cost and privacy trade-off, not a capability upgrade. If you’re spending $200/month on cloud agents and want to cut that, Aider with a local 32B model handles a well-scoped subset of tasks at near-zero marginal cost. If you want your codebase off third-party servers, OpenHands with a local model is the right architecture.

The gap is closing. Eighteen months ago, local 32B models weren’t good enough for agent use. As of mid-2026, they are — for a real and growing subset of coding tasks.

For a full comparison of cloud coding tools, see aicoderscope.com’s Codex vs OpenCode vs Claude Code vs Cursor multi-tool comparison, which covers the commercial side of this landscape in the same benchmark-driven style.

Which one should you actually run

VS Code developer with a cloud API key: Cline. Install the extension, add your API key, and start with a focused single-file task before trying autonomous multi-file work.

Terminal-first developer who wants tight git history: Aider. Run aider --model ollama/qwen2.5-coder:32b to start free; upgrade to a cloud model when quality matters.

Want to fix GitHub issues autonomously without watching: OpenHands. Budget $2–$5 per issue resolved with Claude Opus 4.6.

Privacy-first, no cloud API: Aider + Qwen2.5-Coder 32B via Ollama. Practical results on single-file tasks; multi-file autonomous accuracy drops.

Running automated benchmarks or CI issue-fixing pipelines: mini-SWE-agent.

Large codebases where a task touches 20+ files: Plandex — if AGPL is acceptable and you’re comfortable self-hosting the server.

OS-level automation beyond code editing: Open Interpreter. Not the right choice for pure source code editing tasks.

FAQ

Can Aider replace GitHub Copilot for daily coding?

Not for inline autocomplete — Aider is session-based, not keystroke-by-keystroke. For task-based work (implement this feature, fix this bug), it replaces Copilot’s chat mode. Many developers run both: Cline or Aider for agent tasks, Continue.dev for autocomplete. See the Continue.dev + Ollama setup guide for the autocomplete pairing.

What’s the minimum GPU for a useful local coding agent?

8GB VRAM runs Qwen2.5-Coder 7B — limited results. 16GB handles Qwen2.5-Coder 14B, which is the practical floor for Aider interactive mode. 24GB runs Qwen2.5-Coder 32B Q4 via GGUF — that’s where agent output becomes reliably useful. An RTX 4090 with 24GB VRAM is the current sweet spot for local coding agent use. Hardware build guides are at runaihome.com.

Is SWE-bench the right metric for picking a coding agent?

It’s a useful proxy for autonomous task completion but not the whole picture. SWE-bench measures fully autonomous issue-fixing; it doesn’t measure how well a tool augments an interactive developer. Aider scores lower than Cline on SWE-bench precisely because it’s designed for interactive pair programming, not autonomy. If you’re watching and guiding the agent, that lower score is misleading.

Do these tools work fully offline?

Yes. Configure any of them to use a local model endpoint (Ollama on http://localhost:11434/v1, LM Studio, or Llamafile) and no data leaves your machine. Open Interpreter’s --local flag automates this. Aider uses --model ollama/<model>. Cline has Ollama as a named provider in the settings UI.

OpenHands vs. SWE-agent — are they basically the same?

Different scope. mini-SWE-agent is 100 lines of Python, built for benchmarks and single-issue automated fixing. OpenHands has a full web UI, multi-modal task support (image inputs, browser use), persistent workspaces, role-based access, and an enterprise roadmap backed by $18.8M in VC funding. SWE-agent is the research baseline; OpenHands is the production path if autonomous issue resolution is what you need.

Is there an open-source alternative to Claude Code specifically?

OpenCode is the closest match: MIT-licensed, terminal-native, and the most-starred coding agent on GitHub at ~183K stars as of July 2026. It replicates the Claude Code workflow — agentic terminal sessions against your repo — but decouples the agent from the model, so you can run it against Anthropic, OpenAI, DeepSeek, or a local Ollama model and switch mid-session. Setup details are in our OpenCode + Ollama guide.

Which of these agents can I run over SSH on a headless server?

Anything terminal-native: Aider, mini-SWE-agent, Plandex, Open Interpreter, and OpenCode all run in a plain shell session. OpenHands needs Docker plus a browser to reach its web UI (port-forward localhost:3000 over SSH if the server is remote). Cline is the odd one out — it lives inside an editor, so a headless server means using VS Code Remote-SSH rather than the agent running server-side on its own.

Sources

Aider GitHub repository — Apache 2.0, active weekly releases
Cline GitHub repository — v3.81, ~61K stars
OpenHands GitHub repository — MIT, ~70K stars
mini-SWE-agent GitHub repository — >74% SWE-bench Verified
Plandex GitHub repository — AGPL-3.0, CLI v2.2.1
Open Interpreter GitHub repository — AGPL-3.0, ~57K stars
SWE-bench Official Leaderboard — benchmark scores and methodology
CodeSOTA code-generation benchmark tracker — 2026 coding leaderboard
MarkTechPost: Best AI Agents for Software Development Ranked, May 2026
SWE-bench Pro Leaderboard — why 46% beats 81%
OpenCode GitHub repository — MIT, most-starred coding agent as of mid-2026
Goose — Agentic AI Foundation — Apache 2.0, donated to the Linux Foundation April 2026

Last updated July 22, 2026 — added OpenCode and Goose coverage.

Recommended Gear

NVIDIA RTX 4090 — 24GB VRAM, runs Qwen2.5-Coder 32B Q4 for local coding agents

Was this article helpful?