Monitoring dashboard

The optional read-only control server and Next.js dashboard for live key-health, usage, and cost monitoring.

llm-rotate ships an optional monitoring stack: a read-only control server that exposes your live key health, usage, and cost as JSON, and a Next.js dashboard that renders it. Both are opt-in — the SDK itself stays headless, and nothing here runs unless you start it.

Entirely optional

You never need the dashboard to use llm-rotate. The control server lives behind the server extra, and the in-memory event buffer that feeds it is disabled by default (zero overhead) until the server turns it on.

Install

pip install "llm-rotate[server]"

This adds FastAPI + Uvicorn. The dashboard front-end lives in the ui/ directory of the repo (a separate Next.js app — see Run the dashboard UI).

Run the control server

The simplest path is the bundled CLI, pointed at a JSON/YAML config file:

llm-rotate-server --config llm-rotate.yaml --port 8200

Flag	Default	Purpose
`-c`, `--config`	`$LLM_ROTATE_CONFIG`	Path to a JSON/YAML config file (required).
`--host`	`127.0.0.1`	Bind address.
`--port`	`8200`	Bind port.
`--event-capacity`	`1000`	How many recent call records to retain for analytics.

You can also set the config path by environment variable:

export LLM_ROTATE_CONFIG=llm-rotate.yaml
llm-rotate-server

Embed in an existing app

If you already run a web service, mount the control server alongside it so it shares the same in-process state and event buffer as your live LMRotate:

import uvicorn
from llm_rotate import LMRotate, configure_from_dict
from llm_rotate.server import create_app
 
rot = LMRotate(configure_from_dict(registry={"keys": [...]}, use_keys=[...]))
 
# Pass the live instance so the dashboard reflects real traffic.
app = create_app(rot, event_capacity=1000)
uvicorn.run(app, host="127.0.0.1", port=8200)

create_app takes either a live LMRotate (embedded mode) or a config= (standalone mode — handy with the Redis backend so the dashboard reflects state shared across workers). Set cors_origins=[...] to restrict the browser origins allowed to call it.

Endpoints

All endpoints are GET-only and emit secret-masked data.

Endpoint	Returns
`/api/health`	Per-provider key counts.
`/api/keys`	Per-key health, cooldown/quarantine timers, success/failure counts.
`/api/usage`	Aggregated calls, tokens, and cost, grouped by provider.
`/api/usage/models`	Per-`(provider, model)` calls, tokens, and cost.
`/api/events`	Recent `CallRecord`s (newest last).
`/api/usage/timeseries`	Bucketed calls / errors / tokens / cost.
`/api/config`	Sanitized config (strategy, providers, fallback chains, keys).

Time frames

/api/usage, /api/usage/models, /api/events, and /api/usage/timeseries accept an optional time frame:

window — one of 1h, 24h, 7d, 30d
or explicit since / until as ISO 8601 timestamps

Omit them for all-time data (the time-series endpoint defaults to 24h).

Example: usage by model

curl "http://127.0.0.1:8200/api/usage/models?window=24h"

{
  "models": [
    {
      "provider": "openai",
      "model": "gpt-4o-mini",
      "calls": 307,
      "successes": 285,
      "errors": 22,
      "total_tokens": 568045,
      "estimated_cost_usd": 0.165381
    },
    {
      "provider": "anthropic",
      "model": "claude-haiku-4-5",
      "calls": 175,
      "successes": 163,
      "errors": 12,
      "total_tokens": 327842,
      "estimated_cost_usd": 0.576511
    }
  ]
}

Example: per-key health

curl "http://127.0.0.1:8200/api/keys"

{
  "keys": [
    {
      "key_id": "openai-1",
      "provider": "openai",
      "health": "active",
      "available": true,
      "cooldown_until": null,
      "quarantine_until": null,
      "consecutive_failures": 0,
      "total_successes": 285,
      "total_failures": 22,
      "last_used_at": "2026-06-18T07:21:04+00:00",
      "last_error_type": null
    }
  ]
}

Run the dashboard UI

The dashboard is a Next.js app in the repo's ui/ directory. Point it at a running control server via NEXT_PUBLIC_API_BASE:

git clone https://github.com/Research-Commons/llm-rotate
cd llm-rotate/ui
npm install
NEXT_PUBLIC_API_BASE=http://127.0.0.1:8200 npm run dev
# open http://localhost:3000

It renders provider health, a usage summary, a calls-over-time chart, the per-model breakdown, the key table, and a recent-calls feed — all with a time-frame selector and light/dark theming. It polls the control server every few seconds, so no websocket is needed.

For a zero-credentials local demo, the repo ships examples/dashboard_demo.py, which starts the control server with literal:// placeholder keys.

Security

The control server is read-only and masks every secret, but it does not authenticate requests. Treat it like an internal admin surface: bind it to localhost, keep it behind your own auth/ingress, and restrict cors_origins rather than exposing it to the public internet.

Install#

Run the control server#

Embed in an existing app#

Endpoints#

Time frames#

Example: usage by model#

Example: per-key health#

Run the dashboard UI#

Security#