Observability
Per-call usage logging, health snapshots, and usage summaries.
llm-rotate is built to run in production, so it exposes what's happening:
structured per-call logs, a health snapshot of every key, and a rolling usage
summary.
Usage logging
Every call emits a structured CallRecord (as JSON) to the standard-library
logger named llm_rotate.observability.usage. Each record includes the
provider, model, masked key id, token counts, latency, an estimated cost, and
the outcome.
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("llm_rotate.observability.usage").setLevel(logging.INFO)Pipe that logger to JSON handlers, your log aggregator, etc. Keys are always masked in records and errors — the raw secret never appears in logs.
OpenTelemetry tracing & metrics
With the otel extra installed (pip install "llm-rotate[otel]"), llm-rotate
emits an llm_rotate.attempt span per call attempt plus counters for attempts,
errors, and key rotations under the llm_rotate instrumentation scope. They
flow to whatever exporter your app's OpenTelemetry SDK is configured with.
pip install "llm-rotate[otel]"Without the extra installed, every tracing call is a cheap no-op — the orchestrator calls the same hooks unconditionally, so there's nothing to wire up and no overhead when you don't use OpenTelemetry.
Health snapshot
health() returns a HealthReport describing the current state of every active
key: its health state, recent failure counts, and when a cooled-down or
quarantined key is expected back.
report = await lm.health()
# Inspect per-key / per-provider availability, cooldown timers, etc.This is the natural backing for a /health endpoint if you wrap llm-rotate in
a service.
Usage summary
usage_summary() returns aggregated counters — successes, failures, and token
usage — collected since the process started:
summary = await lm.usage_summary()By default, usage counters live in the in-memory store and reset on restart. For multi-worker deployments, the Redis state store shares key health across processes.
Control server & dashboard
For a live view of key health, usage, and cost, run the optional read-only
control server (pip install "llm-rotate[server]") and the bundled Next.js
dashboard. The event buffer that powers them is disabled by default, so the SDK
stays zero-overhead until you opt in.
pip install "llm-rotate[server]"
llm-rotate-server --config llm-rotate.yaml --port 8200The server exposes secret-masked, GET-only endpoints for health, per-key
state, usage (overall and per-model), recent events, time-series, and sanitized
config — all with optional time-frame filters. See the
Monitoring dashboard guide for the full endpoint
reference, how to embed the server in an existing app, and how to run the
dashboard UI.
Mapping to HTTP
If you expose llm-rotate behind an API, these are the conventional mappings
used by the reference service:
| Condition | HTTP status |
|---|---|
| Success | 200 |
All keys / providers exhausted (NoAvailableKeyError) | 503 |
Other runtime failure (LMRotateError) | 502 |
Bad configuration (ConfigurationError) | 500 (startup) |
See Errors for the full exception hierarchy.