Monitoring (W&B)
Automatic Weights & Biases tracking — the control plane mints a run and injects WANDB_* into every job.
gpu-train integrates with Weights & Biases so every run is
tracked automatically — you don't add any wandb glue to your training script
beyond the usual wandb.init() / logging.
Enable it
Add a tracking.wandb block to your registry:
configure(
registry={
"credentials": [...],
"tracking": {
"wandb": {
"secret_ref": "env://WANDB_API_KEY",
"project": "my-proj",
"entity": "my-team", # optional
"enabled": True,
}
},
},
use=[...],
)Or set WANDB_API_KEY (and optional WANDB_PROJECT / WANDB_ENTITY) in the
environment, or connect W&B from the dashboard.
What happens
When tracking is enabled, for each job the control plane:
- Mints a stable W&B run id / group for the job.
- Injects
WANDB_API_KEY,WANDB_PROJECT,WANDB_ENTITY, and the run id into the remote job's environment (secrets written to owner-only files on the box). - Records the resulting
wandb_run_idandwandb_urlon theJobRecord, so the dashboard's run detail can deep-link straight into the W&B run.
Your train.py just calls wandb.init() as usual — it picks up the injected
environment automatically and lands in the right project/run.
In the dashboard
The Overview shows W&B status (connected / needs key), and each run's detail
view links to its W&B run. You can set or update the W&B key directly from the
Providers page — see the Dashboard.
W&B is entirely optional. With no tracking block (and no WANDB_API_KEY),
runs simply aren't tracked — everything else works the same.