Research CommonsResearch Commons
gpu-train/Providers

Providers

The five built-in providers — local, RunPod, Vast.ai, GCP, and Colab — and how to connect each.

A provider is a backend gpu-train can rent-and-run on. All cloud providers share the same lifecycle (provision → wait for SSH → push code → install deps → launch → stream → terminate); only acquisition differs. Pick one per run via provider="...".

ProviderExtraCredentialNotes
localcorenoneZero-cost subprocess / CPU fallback.
runpod[runpod]RUNPOD_API_KEYOn-demand pods over GraphQL + SSH.
vastai[vastai]VAST_API_KEYCheapest rentable offer under price_cap.
gcp[gcp]service-account JSON + project + zoneDeep-Learning GPU VM.
colab[colab]tunnel host:portBest-effort SSH over a notebook tunnel.

Specify hardware with gpus="<type>:<count>" (e.g. "A100:4", "H100:8", "RTX4090:1") or "cpu". Friendly GPU names are mapped to each provider's own identifiers; unknown names pass through unchanged.

Local

Zero cost, no credential. Runs your entrypoint as a subprocess on your machine — ideal for developing train.py and for CI.

configure(registry={}, use=[])
run(task={"entrypoint": "train.py"}, provider="local", gpus="cpu")

RunPod

{"cred_id": "runpod-1", "provider": "runpod", "secret_ref": "env://RUNPOD_API_KEY"}

Deploys an on-demand pod via RunPod's GraphQL API, waits for the SSH endpoint, and enforces price_cap against the pod's hourly cost.

Vast.ai

{"cred_id": "vast-1", "provider": "vastai", "secret_ref": "env://VAST_API_KEY",
 "ssh_key_ref": "~/.ssh/id_ed25519"}

Searches the marketplace for the cheapest rentable offer matching your gpus request under price_cap, rents it, and polls for its SSH endpoint.

Vast.ai SSH keys

Vast injects your account SSH keys into rented instances. The public key matching the ssh_key_ref private key must already be registered on your Vast.ai account, or the control plane won't be able to connect.

GCP (Google Compute Engine)

{
  "cred_id": "gcp-1", "provider": "gcp",
  "secret_ref": "store://providers/gcp/service_account_json",   # or env:// a path
  "ssh_key_ref": "~/.ssh/id_ed25519",
  "extra": {"project": "my-project", "zone": "us-central1-a",
            "machine_type": "a2-highgpu-1g", "ssh_user": "gpu_train"},
}

Creates a Deep-Learning GPU VM, injects your SSH public key via instance metadata, waits for RUNNING + an external IP, then drives it over SSH. The service account may be a path to a JSON key file or the JSON content itself (so it works with both GOOGLE_APPLICATION_CREDENTIALS and a key pasted into the dashboard).

GCP pricing & price_cap

This client does not query GCE pricing, so price_cap is not enforced and cost_usd stays 0 for GCP runs. Rely on auto_terminate and the idle-timeout watchdog for cost safety — see Cost safety.

Google Colab

Colab has no inbound SSH, so a running Colab runtime is treated as a pre-provisioned SSH target reached through a tunnel the notebook opens.

{
  "cred_id": "colab-1", "provider": "colab",
  "ssh_key_ref": "~/.ssh/id_ed25519",
  "extra": {"host": "0.tcp.ngrok.io", "port": 40022, "ssh_user": "root"},
}

The workflow:

  1. In the dashboard, open Providers → Google Colab → bootstrap notebook cell (or fetch it from GET /v1/colab/bootstrap).
  2. Paste it into a Colab cell with your SSH public key and an ngrok auth token, and run it. It installs sshd, authorizes your key, and opens an ngrok TCP tunnel to port 22.
  3. Copy the printed host / port into the credential (or the dashboard) and connect with the matching private key.

create() adopts the stored endpoint (price 0), wait_ready() probes SSH, and terminate() is a no-op — gpu-train does not own the Colab VM, so close the notebook to release it.

Best-effort & ToS

The Colab connector is best-effort and may bump into Colab's Terms of Service. Use it only on runtimes you are permitted to automate.

Most keys can also be entered from the dashboard instead of the registry — see Credentials & secrets and the Dashboard.