Research CommonsResearch Commons
gpu-train/Monitoring (W&B)

Monitoring (W&B)

Automatic Weights & Biases tracking — the control plane mints a run and injects WANDB_* into every job.

gpu-train integrates with Weights & Biases so every run is tracked automatically — you don't add any wandb glue to your training script beyond the usual wandb.init() / logging.

Enable it

Add a tracking.wandb block to your registry:

configure(
    registry={
        "credentials": [...],
        "tracking": {
            "wandb": {
                "secret_ref": "env://WANDB_API_KEY",
                "project": "my-proj",
                "entity": "my-team",   # optional
                "enabled": True,
            }
        },
    },
    use=[...],
)

Or set WANDB_API_KEY (and optional WANDB_PROJECT / WANDB_ENTITY) in the environment, or connect W&B from the dashboard.

What happens

When tracking is enabled, for each job the control plane:

  1. Mints a stable W&B run id / group for the job.
  2. Injects WANDB_API_KEY, WANDB_PROJECT, WANDB_ENTITY, and the run id into the remote job's environment (secrets written to owner-only files on the box).
  3. Records the resulting wandb_run_id and wandb_url on the JobRecord, so the dashboard's run detail can deep-link straight into the W&B run.

Your train.py just calls wandb.init() as usual — it picks up the injected environment automatically and lands in the right project/run.

In the dashboard

The Overview shows W&B status (connected / needs key), and each run's detail view links to its W&B run. You can set or update the W&B key directly from the Providers page — see the Dashboard.

Optional

W&B is entirely optional. With no tracking block (and no WANDB_API_KEY), runs simply aren't tracked — everything else works the same.