Tabby Team Server Setup 2026: Self-Host Code Completion
TL;DR: Tabby v0.32.0 is an Apache 2.0-licensed code completion server — one GPU box on your network, every developer connects to it. The full team deployment takes under an hour if you have Ubuntu and an NVIDIA GPU ready. The math favors self-hosting once your team hits 8–10 developers.
What you’ll have running after this guide:
- Tabby v0.32.0 running in Docker, exposed over HTTPS via nginx + Let’s Encrypt TLS
- Per-developer API tokens managed through Tabby’s admin panel
- VS Code and JetBrains IDEs connected with inline completions and chat
Honest take: For a 5–15 developer team with a dedicated GPU server, Tabby is the best Copilot replacement available — purpose-built for team use, not retrofitted from a single-user tool. Under 4 developers, the ops overhead isn’t worth it; stick with Copilot.
Prerequisites
What you need before starting:
- Ubuntu 22.04 LTS server (physical or VM — cloud VMs with GPU passthrough work too)
- NVIDIA GPU with driver ≥ 535 installed
- Docker Engine and the NVIDIA Container Toolkit
- A domain name with an A record pointing to the server’s public IP
- Ports 80 and 443 open in your firewall/security group
GPU and model pairing by team size — the table below reflects what works in practice as of mid-2026:
| GPU | VRAM | Team size | Recommended model |
|---|---|---|---|
| RTX 3060 / RTX 3070 | 12 GB | 2–4 devs | Qwen/Qwen2.5-Coder-7B |
| RTX 3090 / 4070 Ti | 24 GB | 5–10 devs | Qwen/Qwen2.5-Coder-7B + chat model |
| RTX 4090 | 24 GB | 10–15 devs | Qwen/Qwen2.5-Coder-7B + Qwen2-7B-Instruct |
Don’t have a dedicated GPU server yet? RunPod rents dedicated NVIDIA instances on monthly contracts — a reasonable staging ground before committing to hardware. Hardware build options at runaihome.com if you’re planning a permanent server.
Step 1: Docker and NVIDIA Container Toolkit
If Docker isn’t installed:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
NVIDIA Container Toolkit (required for GPU passthrough to the container):
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Verify GPU access: docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
If nvidia-smi output shows your GPU, you’re ready.
Step 2: Tabby via Docker Compose
Create the deployment directory and compose file:
sudo mkdir -p /opt/tabby
sudo tee /opt/tabby/docker-compose.yml > /dev/null << 'EOF'
services:
tabby:
image: tabbyml/tabby:0.32.0
command: serve --model Qwen/Qwen2.5-Coder-7B --device cuda --port 8080
volumes:
- tabby_data:/data
ports:
- "127.0.0.1:8080:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
volumes:
tabby_data:
EOF
Pin the image to 0.32.0 rather than latest. Tabby’s model API has changed between minor versions, and a mid-sprint image update that breaks IDE plugins is annoying to debug.
Start the server:
cd /opt/tabby
docker compose up -d
docker compose logs -f tabby
The first run downloads the model — roughly 4–7 GB for Qwen2.5-Coder-7B. Subsequent starts use the cached volume and take under 30 seconds.
Once you see Listening on 0.0.0.0:8080 in the logs, test it:
curl http://localhost:8080/v1/health
# Expected: {"status":"ok","model":"Qwen/Qwen2.5-Coder-7B"}
Step 3: nginx Reverse Proxy with TLS
Install nginx and Certbot:
sudo apt-get install -y nginx certbot python3-certbot-nginx
Create /etc/nginx/sites-available/tabby:
server {
listen 80;
server_name tabby.yourdomain.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name tabby.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/tabby.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/tabby.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
# WebSocket required for Tabby's answer engine streaming
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 300s;
}
}
Enable and get the certificate:
sudo ln -s /etc/nginx/sites-available/tabby /etc/nginx/sites-enabled/
sudo nginx -t
sudo certbot --nginx -d tabby.yourdomain.com
sudo systemctl reload nginx
Certbot installs a systemd timer for automatic renewal — confirm it with sudo certbot renew --dry-run.
Common issue: 502 Bad Gateway right after nginx starts almost always means Tabby is still downloading the model. Watch docker compose logs tabby until you see the health check pass.
WebSocket note: Skipping the Upgrade and Connection headers causes Tabby’s chat streaming to hang silently. The IDE plugins don’t depend on WebSocket for basic completions, but the answer engine does.
Step 4: User Accounts and Per-Developer API Tokens
Open https://tabby.yourdomain.com in a browser. The first visit prompts you to create an admin account — do this before sharing the URL. The first registrant gets admin rights; everyone after that registers as a standard user.
From the admin panel at /admin:
- Users → Invite: Generate invitation links for each team member. Each link is single-use and expires.
- Tokens: Each developer logs in and generates their own token under Settings → Tokens. They copy this once — Tabby doesn’t show full token values again.
- Admin visibility: You see all active tokens, which user owns them, and last-used timestamps. Revoke a token instantly when someone leaves the team.
There’s no built-in token rotation schedule as of v0.32.0. Worth noting in your team runbook: tokens don’t expire unless manually revoked.
Step 5: IDE Plugin Configuration
VS Code
- Install the Tabby extension from the VS Code Marketplace (publisher:
TabbyML) - Open Command Palette (
Ctrl+Shift+P) → Tabby: Connect to Server - Enter your server URL:
https://tabby.yourdomain.com - Paste the token when prompted
Or set it directly in settings.json:
{
"tabby.api.endpoint": "https://tabby.yourdomain.com",
"tabby.api.token": "your-token-here"
}
A green Tabby icon in the VS Code status bar confirms a live connection. Gray icon means connection failed — check the token and that the server URL is reachable.
JetBrains IDEs (IntelliJ, PyCharm, GoLand, WebStorm, Rider)
- File → Settings → Plugins → Marketplace — search Tabby, install, restart the IDE
- File → Settings → Tools → Tabby
- Set Server endpoint:
https://tabby.yourdomain.com - Set Authentication token: paste the developer’s token
Both plugins deliver completions as ghost text, triggered approximately 400ms after you stop typing. The delay is configurable in the extension settings — lower it if your server responds fast on a local network.
Model Selection Guide
Tabby’s models registry documents every supported model. For teams evaluating where to start, here’s what works in practice:
| VRAM | Model | Acceptance rate (typical) | Notes |
|---|---|---|---|
| 8 GB | TabbyML/StarCoder2-3B | 18–22% | Solid multi-language baseline |
| 8 GB | Qwen/Qwen2.5-Coder-1.5B | 16–20% | Faster latency, slightly lower quality |
| 12–16 GB | Qwen/Qwen2.5-Coder-7B | 28–35% | Best quality-per-VRAM trade-off in 2026 |
| 24 GB | Qwen/Qwen2.5-Coder-7B + chat | 28–35% | Adds chat panel without quality change |
Acceptance rate below 15% usually means the model is too weak for your codebase, not that Tabby is misconfigured. Upgrading from StarCoder2-3B to Qwen2.5-Coder-7B typically adds 8–12 percentage points of acceptance rate.
To add a chat model alongside completions (requires 24 GB VRAM for both 7B models), update the compose command:
command: >
serve
--model Qwen/Qwen2.5-Coder-7B
--chat-model Qwen/Qwen2.5-Coder-7B-Instruct
--device cuda
--port 8080
Dashboard and Usage Monitoring
Tabby’s built-in analytics are at /admin/reports. No configuration required — data starts accumulating as developers use the plugin.
The reports page shows:
- Acceptance rate per user and team-wide, charted over time
- Total completions and total acceptances by day or week
- Active users count — useful for spotting developers who installed the plugin but haven’t tried it
Acceptance rate trending downward over several weeks usually points to model drift or developers switching to models with worse context for their stack. Check which languages are in use and compare against the model’s documented training data.
There is no Prometheus metrics endpoint in the open-source version. For uptime monitoring, a simple HTTP check on /v1/health in Healthchecks.io or UptimeRobot is enough. If you need Grafana-level observability, Tabby Enterprise adds a proper metrics layer.
Cost Comparison: Tabby vs. Copilot Business
GitHub Copilot Business transitioned to usage-based billing in June 2026, with a base cost of $19/user/month including a credit allotment.
A self-hosted Tabby deployment on an RTX 4090 workstation:
Hardware (GPU ~$2,000 + server ~$1,500), amortized 36 months: $97/month
Electricity (450W avg × 720 hrs/month × $0.17/kWh): $55/month
Maintenance (est. 1 hr/month at $75/hr developer time): $75/month
─────────────────────────────────────────────────────────────
Total self-hosted: $227/month
| Team size | Copilot Business | Tabby self-hosted | Monthly savings |
|---|---|---|---|
| 5 devs | $95/mo | $227/mo | -$132 (Copilot wins) |
| 10 devs | $190/mo | $227/mo | -$37 (Copilot wins slightly) |
| 12 devs | $228/mo | $227/mo | ~Break even |
| 15 devs | $285/mo | $227/mo | +$58 (Tabby wins) |
| 20 devs | $380/mo | $227/mo | +$153 (Tabby wins) |
The crossover is around 12 developers when maintenance time is included. Without counting maintenance time (if ops is already part of someone’s job), break-even is closer to 8 developers.
The non-financial argument is absolute: completions never leave your network. In healthcare, finance, or defense environments where code cannot touch external APIs, Copilot isn’t an option regardless of price. Tabby fills that gap directly. For more on air-gapped AI setups, see self-hosted AI privacy stack.
When NOT to Set This Up
Skip Tabby self-hosting if:
- Fewer than 4–5 developers on the team. The infrastructure overhead per-person is too high. Use Copilot or Continue.dev with local models per-machine.
- No dedicated GPU server. Running Tabby on a developer workstation defeats the shared-server model. Multi-user load on a personal machine degrades everyone’s completions and the machine’s usability.
- You need agentic code editing. Tabby does inline completions and chat — it won’t write multi-file features, run terminal commands, or reason over a large codebase. For that, see Aider or Continue.dev’s agent mode.
- Primary development language is unusual or niche. Qwen2.5-Coder’s training data covers the mainstream stack well (Python, TypeScript, Go, Java, Rust, C/C++) but drops off on unusual DSLs, legacy COBOL, or highly domain-specific languages. Low acceptance rates in those cases indicate a model coverage issue, not a Tabby issue.
FAQ
Can Tabby run on AMD GPUs?
Yes, using --device rocm and the tabbyml/tabby:0.32.0-rocm image. ROCm support is functional but less tested than CUDA. Expect some rough edges on driver version upgrades.
What happens to completions if the server goes down?
IDE plugins degrade gracefully — no completions, but the IDE stays fully functional. No error dialogs by default. Developers usually notice only if they’re actively waiting for a suggestion.
Can I run CPU-only for evaluation?
Replace --device cuda with --device cpu. Completion latency jumps to 3–8 seconds at 7B, which makes ghost-text completions unusable in practice. Fine for testing the setup; not viable for daily development.
Does Tabby support codebase context (like Copilot workspace)?
Yes. Under admin Settings → Repository, point Tabby at your Git host. It indexes your repos and injects relevant snippets as context during inference. This significantly improves suggestions for internal libraries and project-specific patterns.
Is there a Vim/Neovim plugin?
Yes — TabbyML/vim-tabby on GitHub. Configuration is identical to VS Code and JetBrains: endpoint URL and developer token.
Sources
- Tabby GitHub repository — TabbyML/tabby
- Tabby Docker installation docs
- Tabby Docker Compose docs
- Tabby reverse proxy docs
- Tabby models registry
- Tabby VS Code plugin docs
- Tabby IntelliJ plugin — JetBrains Marketplace
- Tabby usage analytics docs
- GitHub Copilot plans and pricing
- US electricity rates by state — Choose Energy, June 2026
Recommended Gear
- RTX 3060 — 12 GB VRAM, entry-level team server
- RTX 3070 — 8 GB VRAM (use StarCoder2-3B model)
- RTX 3090 — 24 GB VRAM, best mid-range team server
- RTX 4090 — 24 GB VRAM, handles 15+ developers comfortably
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →