Research CommonsResearch Commons
gpu-train/CLI

CLI

The gpu-train command — list jobs, tail logs, kill boxes, reconcile, and serve the dashboard.

Installing gpu-train puts a gpu-train command on your PATH. It's a thin wrapper over the Python API that builds a manager from environment-provided credentials (and the local credential store), so jobs / logs / kill / serve work standalone.

gpu-train --version
gpu-train <command> [options]

Commands

jobs

List recent jobs.

gpu-train jobs --status running --limit 50
OptionDefaultMeaning
--statusallFilter by status (queued, running, succeeded, …).
--limit50Max rows.

logs

Print or follow a job's logs.

gpu-train logs <job-id>          # print stored logs
gpu-train logs -f <job-id>       # stream until the job ends
OptionDefaultMeaning
--limit1000Max lines (non-follow).
-f, --followoffStream until the job reaches a terminal state.

kill

Terminate a job (and its box), or everything.

gpu-train kill <job-id>
gpu-train kill --all       # the panic button

reconcile

Sweep and terminate instances orphaned by a crashed control plane.

gpu-train reconcile

serve

Serve the branded dashboard (requires the [server] extra).

gpu-train serve --host 127.0.0.1 --port 8780
gpu-train serve --no-browser

See Dashboard (UI).

Environment

The CLI reads credentials from the environment — RUNPOD_API_KEY, VAST_API_KEY, GOOGLE_APPLICATION_CREDENTIALS (+ GOOGLE_CLOUD_PROJECT / CLOUDSDK_COMPUTE_ZONE), WANDB_API_KEY — merged with any keys saved from the dashboard. Set GPU_TRAIN_LOG_LEVEL=DEBUG for verbose logs and GPU_TRAIN_HOME to relocate the state directory.