Open-Source Coding Agents 2026: Which One to Run
TL;DR: Coding agents became a real product category in 2025–2026. Cline leads for VS Code users with cloud API access; Aider is the best terminal agent for local models at 32B+; OpenHands handles autonomous issue-fixing at 72% SWE-bench Verified. Using any of these is a cost and privacy play — commercial agents (Cursor, Claude Code) still outperform on pure capability.
| Aider | Cline | OpenHands | |
|---|---|---|---|
| Best for | CLI pair-programming, git-first workflow | VS Code multi-file editing | Autonomous issue-fixing, CI pipelines |
| Local model support | Any OpenAI-compatible endpoint | Ollama, LM Studio, 30+ providers | OpenAI-compatible via config |
| The catch | Output quality floors at ≥32B local models | Token costs climb fast on large repos | Heavier setup; Docker required |
Honest take: If you’re a VS Code developer with a cloud API key, start with Cline. If you need tight git history and local model flexibility, use Aider. If you want an agent that fixes GitHub issues and opens a PR without you watching, OpenHands with Claude Opus 4.6 is currently the open-source best.
Coding agent vs. code completion vs. LLM chat
These three categories blur in marketing but matter for real usage:
Code completion (GitHub Copilot, Tabby, Continue.dev): autocompletes as you type. Zero autonomy. The model never touches your filesystem without you copying its output.
LLM chat (Open WebUI, LibreChat): you paste code, it suggests. You decide whether to apply the suggestion. The model has no access to your repo.
Coding agent: reads your repo, creates and modifies files, runs terminal commands, interprets errors, and loops until the task is done or it gives up. It has real side effects. It can delete a file you meant to keep.
That last point matters for security, for choosing the right tool, and for setting expectations with teammates. An agent running bash commands in your home directory is not the same risk category as an inline autocomplete.
The six agents covered here
The brief for this article called for five (Aider, Cline, Open Interpreter, SWE-agent, Plandex). OpenHands earns a spot because it’s currently the best-performing open-source autonomous agent on the SWE-bench Verified leaderboard and raised a $18.8M Series A in June 2026.
Aider
- License: Apache 2.0
- GitHub stars: ~39K (June 2026)
- Interface: terminal (CLI)
- Editing approach: diff/patch — sends minimal diffs to the model, not the full file every round
Aider’s design is git-first: every AI edit becomes a commit. You run aider from your repo root, describe the task, and it generates a focused commit. If the output is wrong, git diff HEAD~1 shows exactly what changed and reverting is a single command.
It supports 75+ model providers via litellm. Routing to a local Ollama endpoint is a single flag:
pip install aider-install && aider-install
# local model
aider --model ollama/qwen2.5-coder:32b --no-auto-commits
# architect mode: separate planner + editor for harder tasks
aider --architect \
--model claude-opus-4-20250514 \
--editor-model claude-sonnet-4-20250514
The tradeoff: Aider is a pair programmer, not an autonomous agent. Its SWE-bench Verified score in architect mode is 31.4% — lower than Cline or OpenHands in full-autonomy mode. That’s an architecture choice, not a flaw. Aider assumes you’re watching and guiding. The diff-patch approach also sends significantly fewer tokens per session than tools that pass full file contents each turn, which cuts cloud API costs.
Cline
- License: Apache 2.0
- GitHub stars: ~61K (June 2026)
- Version: v3.81
- Interface: VS Code sidebar (also JetBrains, Cursor, Windsurf, Zed, CLI preview)
- Editing approach: full-file rewrites with a diff view before applying
Cline reads your codebase, creates/edits files, runs terminal commands, drives a Puppeteer browser for web tasks, and pauses for approval at each consequential step. Provider support includes Ollama and LM Studio in the dropdown — no manual config files required.
SWE-bench Verified score: ~59.8% running Claude Sonnet 4.6 in autonomous mode. That’s comfortably ahead of Aider’s autonomous score and competitive with proprietary agents in the $20/month tier.
The token cost problem deserves a real callout: Cline passes the full file context to the model on most edits. A single complex task on a large repo can consume 500K+ input tokens. With Claude Sonnet 4.6 at $3/MTok input, that’s $1.50 per task. Do ten tasks a day and it adds up to $450/month — more than a Cursor subscription. Use a local model for exploration and a cloud model for final implementation, or set Cline’s context limits to cap per-session spending.
# Cline with a local Ollama model
# VS Code: Settings > Cline > API Provider > Ollama
# Model: qwen2.5-coder:32b
# Base URL: http://localhost:11434
See the Cline setup guide for full configuration including API keys, context settings, and VS Code workspace options.
OpenHands
- License: MIT
- GitHub stars: ~70K (June 2026)
- Interface: web UI + CLI + REST API
- Editing approach: CodeAct — generates executable Python to modify files, runs it, observes output, loops
OpenHands (formerly OpenDevin) is the most capable open-source autonomous agent currently available. It achieves 72% on SWE-bench Verified running Claude Opus 4.6 — second only to the commercial top-tier agents in published benchmark results. It raised a $18.8M Series A in June 2026 and has enterprise roadmap items including GitHub App integration and team workspaces.
Setup requires Docker:
git clone https://github.com/All-Hands-AI/OpenHands.git
cd OpenHands
cp config.template.toml config.toml
# edit config.toml: add your LLM API key and model name
docker compose up
# open http://localhost:3000
Local model support works via any OpenAI-compatible endpoint configured in config.toml. The quality drop is steep: at 7B parameters you’ll mostly see failures; at 32B you can handle well-scoped tasks. For autonomous issue-fixing that reliably produces PR-quality output, use a cloud model.
The friction is real: Docker Compose, a config file, and a running server before you get started. That’s more overhead than pip install aider. It’s worth it when the task is “fix this GitHub issue” — not when the task is “help me understand this function.”
SWE-agent and mini-SWE-agent
- License: MIT
- Interface: CLI / Python API
- Benchmark: mini-SWE-agent scores >74% on SWE-bench Verified
SWE-agent is Princeton’s NeurIPS 2024 paper that started the benchmark arms race — a system that takes a GitHub issue URL and tries to fix it with an LM agent. The team’s current development focus has shifted to mini-SWE-agent: a ~100-line Python script that scores above 74% on SWE-bench Verified and is dramatically simpler to read and extend.
Be clear about what this is: research infrastructure, not a daily driver. There’s no IDE extension, no streaming output, no UX beyond a CLI. It’s valuable for benchmarking your own models, for understanding how agent scaffolding works, and for running automated issue-fixing pipelines. For actual development work, Aider or Cline is a better fit.
pip install mini-swe-agent
# fix a GitHub issue
mini-swe-agent run \
--issue "https://github.com/your-org/repo/issues/42" \
--model claude-opus-4-20250514
Plandex
- License: AGPL-3.0
- Version: CLI v2.2.1 (July 2025; check github.com/plandex-ai/plandex for current)
- Interface: terminal
- Editing approach: streamed changes applied to a sandbox, staged for review before touching your working directory
Plandex targets large multi-file projects. It handles up to 2M context tokens directly, can index 20M+ token codebases via tree-sitter project maps, and supports 400+ models via OpenRouter plus any local OpenAI-compatible endpoint. The commercial Plandex Cloud shut down in October 2025 — you self-host the server via Docker.
# install CLI
curl -sL https://plandex.ai/install.sh | bash
# start self-hosted server
docker run -p 8099:8099 plandexai/plandex-server:latest
export PLANDEX_API_URL=http://localhost:8099
plandex new
AGPL-3.0 license note: if you distribute software that incorporates Plandex as a component (versus using it as a standalone tool), the AGPL copyleft terms apply. Fine for internal use; worth checking with your legal team if you’re building a product on top of it.
Open Interpreter
- License: AGPL-3.0
- GitHub stars: ~57K (June 2026)
- Interface: terminal + Python API
- Editing approach: generates code and executes it directly in your environment
Open Interpreter is the outlier in this group. It’s not primarily a code editor — it’s a natural-language shell. Tell it “resize all images in this folder to 800px wide” and it writes and runs the Python to do it. Tell it “open a browser and extract all prices from this page into a CSV” and it does that too.
Local model support works via interpreter --local, which prompts you to choose Ollama, LM Studio, or Llamafile. The local experience is rougher than the other tools — small models frequently generate broken code that loops. Practical minimum is 14B; 32B is where it becomes reliable for multi-step OS tasks.
pip install open-interpreter
interpreter --local # walks through local model selection
interpreter "convert all .wav files in /music to .mp3 using ffmpeg"
The right use case: OS-level automation that goes beyond editing source files. For pure coding tasks (refactor this function, fix this bug), Aider or Cline produces better output with less scaffolding.
Also see the Open Interpreter Review and the Open Interpreter vs Aider vs Claude Code comparison for deeper coverage.
Comparison matrix
| Aider | Cline | OpenHands | mini-SWE-agent | Plandex | Open Interpreter | |
|---|---|---|---|---|---|---|
| SWE-bench Verified | 31.4% (architect mode) | ~59.8% (Claude S 4.6) | 72% (Claude O 4.6) | >74% | not published | not applicable |
| Local model support | ✅ any OAI-compatible | ✅ Ollama, LM Studio | ✅ OAI-compatible | ✅ OAI-compatible | ✅ OpenRouter + local | ✅ Ollama, LM Studio |
| IDE integration | CLI only | VS Code, JetBrains, Cursor, Zed | Web UI + REST | CLI only | CLI only | CLI + Python API |
| Editing approach | Diff/patch | Full-file + diff review | CodeAct execution loop | Targeted file edits | Streamed, sandboxed | Direct code execution |
| Min useful local model | 32B (Qwen2.5-Coder) | 32B (Qwen2.5-Coder) | 32B for simple tasks | 32B | 14B+ | 14B+ |
| License | Apache 2.0 | Apache 2.0 | MIT | MIT | AGPL-3.0 | AGPL-3.0 |
| Est. cloud cost per session | $0.05–$1.50 | $0.50–$5 | $1–$10 | $1–$8 | $0.50–$5 | $0.10–$2 |
| GitHub stars (June 2026) | ~39K | ~61K | ~70K | part of SWE-agent repo | ~5K | ~57K |
Local model reality
All six tools advertise local model support. The honest picture by model size:
7B parameters: every tool connects without errors but generates code you’ll rewrite. Useful for explaining a function or generating obvious boilerplate. Not useful for debugging or multi-step tasks.
14B–20B parameters: Aider becomes genuinely useful for focused single-file changes if instructions are specific. Open Interpreter handles one-shot shell commands reliably. Cline can complete simple refactors with detailed prompting.
32B parameters (Qwen2.5-Coder 32B Q4, Devstral Small 2 24B, DeepSeek Coder V2): the gap to cloud models narrows enough for real daily work. Aider’s diff-patch architecture is especially efficient here — it sends 2K–5K tokens per round-trip versus Cline’s 15K–50K on the same task. That efficiency matters with the limited context windows of quantized 32B models.
70B+ parameters via GGUF on 48GB+ VRAM: output quality approaches cloud API levels for most coding tasks. An RTX 4090 with 24GB VRAM runs 32B Q4 well and 70B Q2 at reduced quality. For hardware guidance, see runaihome.com. For burst capacity without the upfront GPU purchase, RunPod A100 instances run around $1.50–$2/hour.
The honest constraint: none of these tools produce reliable multi-file autonomous output with a local model under 32B. If you want autonomous PR-quality work, you’re either running a cloud API or a serious GPU setup.
SWE-bench: what the score tells you (and what it doesn’t)
SWE-bench Verified measures whether an agent can fix a real GitHub issue on a real codebase. 500 human-validated tasks, scored pass/fail. It’s the most credible coding benchmark published so far.
But OpenAI retired SWE-bench Verified as their frontier evaluation in 2026 because scores are climbing faster than real-world capability — the tasks are public, solutions exist on GitHub, and models trained on that data score higher than their actual programming ability. SWE-bench Pro (1,865 multi-language tasks with less public exposure) is the emerging replacement.
Practical interpretation of the current scores:
- Under 40%: useful pair programmer, won’t ship autonomous PR-quality code.
- 40–65%: handles well-scoped bug fixes autonomously, still needs review.
- Over 65%: handles moderate complexity tasks autonomously with reasonable reliability.
None of the open-source tools covered here cross 80% on SWE-bench Verified. Commercial top-tier does. The gap is real.
Also important: Aider’s 31.4% is not a fair comparison to OpenHands’ 72% for interactive use. Aider’s benchmark score reflects its autonomous mode, which is not how most developers use it. Aider in interactive mode — where a developer reviews each change before committing — produces output quality that users consistently rate much higher than that number implies.
Cost breakdown: cloud API vs. local model
Assuming 10 meaningful coding sessions per day, 22 working days per month:
| Setup | Monthly cost estimate |
|---|---|
| Cline + Claude Sonnet 4.6 (heavy use, full-file context) | $150–$350 |
| Cline + Claude Sonnet 4.6 (light use, small files) | $30–$80 |
| Aider + Claude Sonnet 4.6 (diff mode, same workload) | $15–$50 |
| Aider + local Qwen2.5-Coder 32B Q4 (own GPU) | ~$0 + electricity |
| Aider + local model on RunPod A100 (8 hrs/day) | ~$35/month |
| OpenHands + Claude Opus 4.6 (10 issues/day) | $80–$200 |
| Cursor Pro | $20/month flat |
| Claude Code subscription | $100/month flat |
Aider’s diff-patch architecture sends 5–10× fewer tokens per session than Cline’s full-file approach on the same codebase. If you’re price-sensitive and already using a cloud model, switching from Cline to Aider for large-repo work can cut your LLM spend significantly without changing model quality.
When NOT to use open-source coding agents
Large multi-file refactors without a 32B+ local model: agent output quality below that threshold means you’ll spend more time fixing than you would have spent writing.
Security-sensitive codebases without review gates: every tool here can run shell commands or execute generated code. Read what the agent is about to run before you approve it. AGPL tools (Plandex, Open Interpreter) need legal review before you embed them in a product.
When you need accurate test coverage: these agents write tests. Whether the tests correctly verify behavior is a different question. Human review of generated test logic is not optional.
Production CI without a container sandbox: running an autonomous agent that can modify production files in an uncontained CI environment is a meaningful risk. Use OpenHands with Docker isolation or a dedicated sandbox VM.
When your local model is under 14B: the ratio of useful output to debugging agent errors is negative. Use a cloud API or don’t use an agent.
The elephant in the room: cloud agents still lead
Commercial agents outperform all of these on benchmarks and on day-to-day output:
- Cursor Background Agent: 65%+ SWE-bench Verified, tight VS Code UX, $20/month flat.
- Claude Code: ~80.9% SWE-bench Verified, first-class git and GitHub integration, available as a CLI.
- Devin: enterprise pricing, but consistently above 70% on uncontaminated benchmarks with less hand-holding required.
The honest summary: running an open-source coding agent is a cost and privacy trade-off, not a capability upgrade. If you’re spending $200/month on cloud agents and want to cut that, Aider with a local 32B model handles a well-scoped subset of tasks at near-zero marginal cost. If you want your codebase off third-party servers, OpenHands with a local model is the right architecture.
The gap is closing. Eighteen months ago, local 32B models weren’t good enough for agent use. As of mid-2026, they are — for a real and growing subset of coding tasks.
For a full comparison of cloud coding tools including Cursor, Windsurf, and GitHub Copilot, see aicoderscope.com.
Which one should you actually run
VS Code developer with a cloud API key: Cline. Install the extension, add your API key, and start with a focused single-file task before trying autonomous multi-file work.
Terminal-first developer who wants tight git history: Aider. Run aider --model ollama/qwen2.5-coder:32b to start free; upgrade to a cloud model when quality matters.
Want to fix GitHub issues autonomously without watching: OpenHands. Budget $2–$5 per issue resolved with Claude Opus 4.6.
Privacy-first, no cloud API: Aider + Qwen2.5-Coder 32B via Ollama. Practical results on single-file tasks; multi-file autonomous accuracy drops.
Running automated benchmarks or CI issue-fixing pipelines: mini-SWE-agent.
Large codebases where a task touches 20+ files: Plandex — if AGPL is acceptable and you’re comfortable self-hosting the server.
OS-level automation beyond code editing: Open Interpreter. Not the right choice for pure source code editing tasks.
FAQ
Can Aider replace GitHub Copilot for daily coding?
Not for inline autocomplete — Aider is session-based, not keystroke-by-keystroke. For task-based work (implement this feature, fix this bug), it replaces Copilot’s chat mode. Many developers run both: Cline or Aider for agent tasks, Continue.dev for autocomplete. See the Continue.dev + Ollama setup guide for the autocomplete pairing.
What’s the minimum GPU for a useful local coding agent?
8GB VRAM runs Qwen2.5-Coder 7B — limited results. 16GB handles Qwen2.5-Coder 14B, which is the practical floor for Aider interactive mode. 24GB runs Qwen2.5-Coder 32B Q4 via GGUF — that’s where agent output becomes reliably useful. An RTX 4090 with 24GB VRAM is the current sweet spot for local coding agent use. Hardware build guides are at runaihome.com.
Is SWE-bench the right metric for picking a coding agent?
It’s a useful proxy for autonomous task completion but not the whole picture. SWE-bench measures fully autonomous issue-fixing; it doesn’t measure how well a tool augments an interactive developer. Aider scores lower than Cline on SWE-bench precisely because it’s designed for interactive pair programming, not autonomy. If you’re watching and guiding the agent, that lower score is misleading.
Do these tools work fully offline?
Yes. Configure any of them to use a local model endpoint (Ollama on http://localhost:11434/v1, LM Studio, or Llamafile) and no data leaves your machine. Open Interpreter’s --local flag automates this. Aider uses --model ollama/<model>. Cline has Ollama as a named provider in the settings UI.
OpenHands vs. SWE-agent — are they basically the same?
Different scope. mini-SWE-agent is 100 lines of Python, built for benchmarks and single-issue automated fixing. OpenHands has a full web UI, multi-modal task support (image inputs, browser use), persistent workspaces, role-based access, and an enterprise roadmap backed by $18.8M in VC funding. SWE-agent is the research baseline; OpenHands is the production path if autonomous issue resolution is what you need.
Sources
- Aider GitHub repository — Apache 2.0, active weekly releases
- Cline GitHub repository — v3.81, ~61K stars
- OpenHands GitHub repository — MIT, ~70K stars
- mini-SWE-agent GitHub repository — >74% SWE-bench Verified
- Plandex GitHub repository — AGPL-3.0, CLI v2.2.1
- Open Interpreter GitHub repository — AGPL-3.0, ~57K stars
- SWE-bench Official Leaderboard — benchmark scores and methodology
- CodeSOTA code-generation benchmark tracker — 2026 coding leaderboard
- MarkTechPost: Best AI Agents for Software Development Ranked, May 2026
- SWE-bench Pro Leaderboard — why 46% beats 81%
Recommended Gear
- NVIDIA RTX 4090 — 24GB VRAM, runs Qwen2.5-Coder 32B Q4 for local coding agents
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →