Monitoring dashboard
The optional read-only control server and Next.js dashboard for live key-health, usage, and cost monitoring.
llm-rotate ships an optional monitoring stack: a read-only control server
that exposes your live key health, usage, and cost as JSON, and a Next.js
dashboard that renders it. Both are opt-in — the SDK itself stays headless, and
nothing here runs unless you start it.
You never need the dashboard to use llm-rotate. The control server lives behind
the server extra, and the in-memory event buffer that feeds it is disabled by
default (zero overhead) until the server turns it on.
Install
pip install "llm-rotate[server]"This adds FastAPI + Uvicorn. The dashboard front-end lives in the
ui/ directory of
the repo (a separate Next.js app — see Run the dashboard UI).
Run the control server
The simplest path is the bundled CLI, pointed at a JSON/YAML config file:
llm-rotate-server --config llm-rotate.yaml --port 8200| Flag | Default | Purpose |
|---|---|---|
-c, --config | $LLM_ROTATE_CONFIG | Path to a JSON/YAML config file (required). |
--host | 127.0.0.1 | Bind address. |
--port | 8200 | Bind port. |
--event-capacity | 1000 | How many recent call records to retain for analytics. |
You can also set the config path by environment variable:
export LLM_ROTATE_CONFIG=llm-rotate.yaml
llm-rotate-serverEmbed in an existing app
If you already run a web service, mount the control server alongside it so it
shares the same in-process state and event buffer as your live LMRotate:
import uvicorn
from llm_rotate import LMRotate, configure_from_dict
from llm_rotate.server import create_app
rot = LMRotate(configure_from_dict(registry={"keys": [...]}, use_keys=[...]))
# Pass the live instance so the dashboard reflects real traffic.
app = create_app(rot, event_capacity=1000)
uvicorn.run(app, host="127.0.0.1", port=8200)create_app takes either a live LMRotate (embedded mode) or a config=
(standalone mode — handy with the Redis backend
so the dashboard reflects state shared across workers). Set cors_origins=[...]
to restrict the browser origins allowed to call it.
Endpoints
All endpoints are GET-only and emit secret-masked data.
| Endpoint | Returns |
|---|---|
/api/health | Per-provider key counts. |
/api/keys | Per-key health, cooldown/quarantine timers, success/failure counts. |
/api/usage | Aggregated calls, tokens, and cost, grouped by provider. |
/api/usage/models | Per-(provider, model) calls, tokens, and cost. |
/api/events | Recent CallRecords (newest last). |
/api/usage/timeseries | Bucketed calls / errors / tokens / cost. |
/api/config | Sanitized config (strategy, providers, fallback chains, keys). |
Time frames
/api/usage, /api/usage/models, /api/events, and /api/usage/timeseries
accept an optional time frame:
window— one of1h,24h,7d,30d- or explicit
since/untilas ISO 8601 timestamps
Omit them for all-time data (the time-series endpoint defaults to 24h).
Example: usage by model
curl "http://127.0.0.1:8200/api/usage/models?window=24h"{
"models": [
{
"provider": "openai",
"model": "gpt-4o-mini",
"calls": 307,
"successes": 285,
"errors": 22,
"total_tokens": 568045,
"estimated_cost_usd": 0.165381
},
{
"provider": "anthropic",
"model": "claude-haiku-4-5",
"calls": 175,
"successes": 163,
"errors": 12,
"total_tokens": 327842,
"estimated_cost_usd": 0.576511
}
]
}Example: per-key health
curl "http://127.0.0.1:8200/api/keys"{
"keys": [
{
"key_id": "openai-1",
"provider": "openai",
"health": "active",
"available": true,
"cooldown_until": null,
"quarantine_until": null,
"consecutive_failures": 0,
"total_successes": 285,
"total_failures": 22,
"last_used_at": "2026-06-18T07:21:04+00:00",
"last_error_type": null
}
]
}Run the dashboard UI
The dashboard is a Next.js app in the repo's ui/ directory. Point it at a
running control server via NEXT_PUBLIC_API_BASE:
git clone https://github.com/Research-Commons/llm-rotate
cd llm-rotate/ui
npm install
NEXT_PUBLIC_API_BASE=http://127.0.0.1:8200 npm run dev
# open http://localhost:3000It renders provider health, a usage summary, a calls-over-time chart, the per-model breakdown, the key table, and a recent-calls feed — all with a time-frame selector and light/dark theming. It polls the control server every few seconds, so no websocket is needed.
For a zero-credentials local demo, the repo ships
examples/dashboard_demo.py,
which starts the control server with literal:// placeholder keys.
Security
The control server is read-only and masks every secret, but it does not
authenticate requests. Treat it like an internal admin surface: bind it to
localhost, keep it behind your own auth/ingress, and restrict cors_origins
rather than exposing it to the public internet.