Providers
The five built-in providers — local, RunPod, Vast.ai, GCP, and Colab — and how to connect each.
A provider is a backend gpu-train can rent-and-run on. All cloud providers
share the same lifecycle (provision → wait for SSH → push code → install deps →
launch → stream → terminate); only acquisition differs. Pick one per run via
provider="...".
| Provider | Extra | Credential | Notes |
|---|---|---|---|
local | core | none | Zero-cost subprocess / CPU fallback. |
runpod | [runpod] | RUNPOD_API_KEY | On-demand pods over GraphQL + SSH. |
vastai | [vastai] | VAST_API_KEY | Cheapest rentable offer under price_cap. |
gcp | [gcp] | service-account JSON + project + zone | Deep-Learning GPU VM. |
colab | [colab] | tunnel host:port | Best-effort SSH over a notebook tunnel. |
Specify hardware with gpus="<type>:<count>" (e.g. "A100:4", "H100:8",
"RTX4090:1") or "cpu". Friendly GPU names are mapped to each provider's own
identifiers; unknown names pass through unchanged.
Local
Zero cost, no credential. Runs your entrypoint as a subprocess on your machine —
ideal for developing train.py and for CI.
configure(registry={}, use=[])
run(task={"entrypoint": "train.py"}, provider="local", gpus="cpu")RunPod
{"cred_id": "runpod-1", "provider": "runpod", "secret_ref": "env://RUNPOD_API_KEY"}Deploys an on-demand pod via RunPod's GraphQL API, waits for the SSH endpoint,
and enforces price_cap against the pod's hourly cost.
Vast.ai
{"cred_id": "vast-1", "provider": "vastai", "secret_ref": "env://VAST_API_KEY",
"ssh_key_ref": "~/.ssh/id_ed25519"}Searches the marketplace for the cheapest rentable offer matching your
gpus request under price_cap, rents it, and polls for its SSH endpoint.
Vast injects your account SSH keys into rented instances. The public key
matching the ssh_key_ref private key must already be registered on your
Vast.ai account, or the control plane won't be able to connect.
GCP (Google Compute Engine)
{
"cred_id": "gcp-1", "provider": "gcp",
"secret_ref": "store://providers/gcp/service_account_json", # or env:// a path
"ssh_key_ref": "~/.ssh/id_ed25519",
"extra": {"project": "my-project", "zone": "us-central1-a",
"machine_type": "a2-highgpu-1g", "ssh_user": "gpu_train"},
}Creates a Deep-Learning GPU VM, injects your SSH public key via instance
metadata, waits for RUNNING + an external IP, then drives it over SSH. The
service account may be a path to a JSON key file or the JSON content
itself (so it works with both GOOGLE_APPLICATION_CREDENTIALS and a key pasted
into the dashboard).
This client does not query GCE pricing, so price_cap is not enforced and
cost_usd stays 0 for GCP runs. Rely on auto_terminate and the idle-timeout
watchdog for cost safety — see Cost safety.
Google Colab
Colab has no inbound SSH, so a running Colab runtime is treated as a pre-provisioned SSH target reached through a tunnel the notebook opens.
{
"cred_id": "colab-1", "provider": "colab",
"ssh_key_ref": "~/.ssh/id_ed25519",
"extra": {"host": "0.tcp.ngrok.io", "port": 40022, "ssh_user": "root"},
}The workflow:
- In the dashboard, open Providers → Google Colab → bootstrap notebook cell
(or fetch it from
GET /v1/colab/bootstrap). - Paste it into a Colab cell with your SSH public key and an ngrok auth
token, and run it. It installs
sshd, authorizes your key, and opens an ngrok TCP tunnel to port 22. - Copy the printed
host/portinto the credential (or the dashboard) and connect with the matching private key.
create() adopts the stored endpoint (price 0), wait_ready() probes SSH, and
terminate() is a no-op — gpu-train does not own the Colab VM, so close the
notebook to release it.
The Colab connector is best-effort and may bump into Colab's Terms of Service. Use it only on runtimes you are permitted to automate.
Most keys can also be entered from the dashboard instead of the registry — see Credentials & secrets and the Dashboard.