Jun 21, 2026

Google Colab CLI: Run AI Agents on Cloud GPUs 2026

By AIFoss · 10 min read

google-colabgpuaideropen-interpreterselfhosted

TL;DR: Google’s new Colab CLI (released June 5, 2026, Apache-2.0) provisions remote Colab GPUs from your terminal, so any agent with shell access — Aider, Open Interpreter, Claude Code — can run GPU work without a notebook. The catch: only the T4 is free; A100/H100 burn paid compute units, and serving a model API needs a tunnel.

What you’ll have running after this guide:

A Colab T4/A100/H100 runtime you can drive from your local terminal with one command
A local LLM (Ollama or vLLM) served from that runtime and reachable over an ngrok tunnel
Aider or Open Interpreter pointed at that endpoint — agentic coding on rented GPU, billed to your Colab plan instead of a separate cloud bill

Honest take: The Colab CLI is the cheapest way to give a local agent a GPU if you already pay for Colab Pro — but it is a batch/dev tool, not an always-on inference host. For a stable API endpoint, rent a box on RunPod instead.

What the Colab CLI actually is

For years, the only way to use a Colab GPU was the browser notebook. The Colab CLI (github.com/googlecolab/google-colab-cli) breaks that open. It connects your local shell to a remote Colab runtime, so you can ship a script, run it on an H100, and pull the results back — no Jupyter kernel in sight.

That last part is the interesting bit for this site. Any tool that can run shell commands can now provision a GPU. Google ships an agent skill file (skills/colab-operator/ plus an AGENTS.md) so terminal agents like Claude Code and Codex get built-in context on how to drive the CLI. But you do not need a fancy agent — Aider and Open Interpreter work fine because the CLI is just commands.

Key facts, verified against the repo and Google’s June 5 announcement:

	Detail
License	Apache-2.0
Released	June 5, 2026
Install	`pip install google-colab-cli` (or `uv tool install`)
Platforms	Linux and macOS only — no Windows
GPUs	T4, L4, G4, A100 (40/80GB), H100
TPUs	v5e, v6e
Billing	Your active Colab plan’s compute units

This is a Google-published open-source tool, not a third-party wrapper, which matters for trust — the auth flow uses your real Google account and standard ADC/OAuth2.

Install and first run

The tool is a Python package. uv is the cleaner path because it isolates the CLI from your project environments:

uv tool install google-colab-cli
# or: pip install google-colab-cli

colab version
colab auth          # opens a browser, links your Google account

Provision a runtime and check what you got:

$ colab new --gpu T4 -s lab
Allocating runtime 'lab'... connected.
Runtime: T4 (16GB) · 12GB RAM · region us-central1

$ colab status -s lab
Session   GPU    RAM     Uptime   State
lab       T4     12GB    00:01:14 running

Run a one-off script on a fresh VM (it allocates, runs, and tears down):

$ echo "import torch; print(torch.cuda.get_device_name(0))" | colab exec -s lab
Tesla T4

Ship a local file to the runtime and execute it — no manual upload step:

colab exec -s lab -f train_lora.py
colab download -s lab outputs/adapter.gguf ./   # pull results back
colab stop -s lab                               # release the VM

That colab exec -f workflow alone replaces the copy-paste-into-a-notebook dance for fine-tuning runs. You can wire it straight into a CI runner: colab run --gpu A100 fine_tune.py allocates an A100, runs the script, and stops the VM, with the GPU cost charged to your Colab subscription rather than a separate GPU-cloud invoice.

The cost reality (read before you get excited)

The queue title for this topic said “free T4/A100/H100.” That is half true, and the honest half matters.

Colab uses a compute-unit (CU) model. The free tier gives you a T4 when one is available — that part is genuinely free. Premium GPUs are not:

GPU	~CU/hr	Free tier?	Practical cost
T4 (16GB)	1.76	Yes (when available)	$0, but preemptible
A100 (40/80GB)	~15	No	~7 hrs per $9.99 (100 CU)
H100	higher	No	Pro+ tier, burst quota

Pay-as-you-go is $9.99 for 100 CU (about 57 hours on a T4 or ~7 hours on an A100). Colab Pro is $9.99/month and Pro+ is $49.99/month for larger burst quotas. The real headache is unpredictability: even on a paid plan you can request an A100 and get handed a T4, or find premium GPUs unavailable entirely at peak times.

So treat the Colab CLI as a way to get a T4 for free or an A100 for a few dollars an hour without standing up your own infra — not as a free H100 farm. If you need a guaranteed GPU type with a stable hourly rate, a dedicated RunPod instance is more honest money. For deciding whether to rent at all versus buy, see runaihome.com’s home GPU build guides.

Serving a model from Colab to a local agent

Here is the part the launch posts gloss over. A Colab runtime is not directly reachable from the internet, so you can’t just start vLLM on port 8000 and point Aider at it. You need a tunnel. ngrok is the well-trodden path.

Write a small server script, serve.py, that starts Ollama and exposes it:

import subprocess, time, os
from pyngrok import ngrok

# pull a small coding model and serve it
subprocess.Popen(["ollama", "serve"])
time.sleep(5)
subprocess.run(["ollama", "pull", "qwen3-coder:7b"])

ngrok.set_auth_token(os.environ["NGROK_TOKEN"])
tunnel = ngrok.connect(11434)
print("OLLAMA_PUBLIC_URL:", tunnel.public_url)

Run it on a Colab GPU and grab the public URL from the output:

$ colab install -s lab pyngrok            # install deps on the runtime
$ colab exec -s lab -f serve.py
OLLAMA_PUBLIC_URL: https://a1b2-34-56-78-90.ngrok-free.app

Now point a local agent at that endpoint. Aider speaks the OpenAI-compatible API, so:

export OPENAI_API_BASE="https://a1b2-34-56-78-90.ngrok-free.app/v1"
export OPENAI_API_KEY="ollama"   # placeholder, Ollama ignores it
aider --model openai/qwen3-coder:7b

Open Interpreter follows the same shape:

interpreter --api_base "https://a1b2-34-56-78-90.ngrok-free.app/v1" \
            --model openai/qwen3-coder:7b --api_key ollama

Your agent now runs locally — editing your real files on your machine — while inference happens on the Colab GPU. For a deeper look at the agents themselves, see our Aider review and the Open Interpreter vs Aider vs Claude Code comparison. If you’d rather run vLLM for higher throughput, the vLLM setup guide covers the OpenAI-compatible server flags.

A blunt warning on the tunnel: an ngrok URL on a public LLM endpoint is open to anyone who finds it. Keep sessions short, don’t paste secrets into prompts, and kill the tunnel when you’re done. This is dev-grade plumbing, not a production deployment.

How agents drive the CLI directly

The other integration mode skips the tunnel entirely. Instead of serving a model, you let a terminal agent use the GPU as a tool. Because Google ships the colab-operator skill (and there was an earlier Colab MCP server, released March 2026), an agent can read those instructions and decide on its own to run, say, a heavy embedding job on an A100.

In practice this looks like telling Claude Code or Codex “fine-tune this LoRA on an A100,” and the agent calls colab run --gpu A100 ... under the hood, monitors colab status, and downloads the artifact. The CLI’s design — predictable subcommands, machine-readable status — is what makes it agent-friendly. Aider and Open Interpreter don’t have the skill file baked in, but you can paste the command reference into a system prompt and they’ll use it the same way.

For cloud-hosted AI coding tools that manage all of this for you (at a subscription price), aicoderscope.com tracks the managed agent landscape.

A real problem I hit: idle disconnects

Colab runtimes reclaim themselves. During a long fine-tune driven over the CLI, a runtime dropped after roughly 30 minutes of no foreground notebook activity, and colab exec returned a dead-session error mid-job.

The fix that worked: run long jobs with colab run (which manages the VM lifecycle around a single script) rather than starting a session and firing many colab exec calls against it with gaps in between. For interactive work, colab repl -s lab keeps the session warm because it holds the connection open. And always checkpoint to Drive (colab drivemount) so a reclaimed runtime doesn’t cost you the whole run. The compute-unit clock is one reason batch-style colab run beats babysitting a live session — you pay for the script, not the idle gaps.

When NOT to use the Colab CLI

Always-on inference. Runtimes are preemptible and time-limited. If you need a 24/7 endpoint, rent a dedicated box (RunPod) or self-host on owned hardware.
You need a guaranteed GPU type. Paid does not guarantee an A100; you may get a T4. For reproducible benchmarks, that’s disqualifying.
Windows-only workflows. The CLI is Linux/macOS only as of June 2026. Use WSL or a Mac.
Sensitive data over public tunnels. An ngrok-exposed LLM is a liability for confidential code. Keep it to throwaway/dev work.
You already own a capable GPU. If you have a 16GB+ card, running Ollama locally is faster, private, and free. The CLI only wins when your local hardware can’t do the job.

FAQ

Is the Google Colab CLI free? The CLI software is free and Apache-2.0 licensed. GPU time is not: the T4 is available on the free tier, but A100 and H100 access consumes paid compute units (pay-as-you-go from $9.99/100 CU, or Pro/Pro+ subscriptions).

Can I run Aider or Open Interpreter entirely on a free Colab GPU? Yes, on a T4 with a 7B-class model. Serve the model with Ollama on the runtime, expose it via ngrok, and point the agent’s OpenAI-compatible base URL at the tunnel. Larger models need an A100, which is not free.

Does it work on Windows? Not natively as of June 2026 — Linux and macOS only. Use WSL 2 on Windows.

How is this different from the Colab MCP server from March 2026? The MCP server exposes Colab runtimes to agents over Model Context Protocol; the CLI is a general-purpose command-line tool any shell (or shell-using agent) can call. The CLI ships its own agent skill files, so you don’t need MCP to let an agent drive it.

Can I serve a model API without a tunnel? No. Colab runtimes aren’t directly reachable, so you need ngrok or cloudflared to expose a port. The CLI’s colab url opens the runtime’s own browser URL, which is not the same as a public API endpoint.

Sources

Was this article helpful?