Observability

llm-rotate is built to run in production, so it exposes what's happening: structured per-call logs, a health snapshot of every key, and a rolling usage summary.

Usage logging

Every call emits a structured CallRecord (as JSON) to the standard-library logger named llm_rotate.observability.usage. Each record includes the provider, model, masked key id, token counts, latency, an estimated cost, and the outcome.

import logging
 
logging.basicConfig(level=logging.INFO)
logging.getLogger("llm_rotate.observability.usage").setLevel(logging.INFO)

Pipe that logger to JSON handlers, your log aggregator, etc. Keys are always masked in records and errors — the raw secret never appears in logs.

OpenTelemetry tracing & metrics

With the otel extra installed (pip install "llm-rotate[otel]"), llm-rotate emits an llm_rotate.attempt span per call attempt plus counters for attempts, errors, and key rotations under the llm_rotate instrumentation scope. They flow to whatever exporter your app's OpenTelemetry SDK is configured with.

pip install "llm-rotate[otel]"

Zero-cost when absent

Without the extra installed, every tracing call is a cheap no-op — the orchestrator calls the same hooks unconditionally, so there's nothing to wire up and no overhead when you don't use OpenTelemetry.

Health snapshot

health() returns a HealthReport describing the current state of every active key: its health state, recent failure counts, and when a cooled-down or quarantined key is expected back.

report = await lm.health()
# Inspect per-key / per-provider availability, cooldown timers, etc.

This is the natural backing for a /health endpoint if you wrap llm-rotate in a service.

Usage summary

usage_summary() returns aggregated counters — successes, failures, and token usage — collected since the process started:

summary = await lm.usage_summary()

By default, usage counters live in the in-memory store and reset on restart. For multi-worker deployments, the Redis state store shares key health across processes.

Control server & dashboard

For a live view of key health, usage, and cost, run the optional read-only control server (pip install "llm-rotate[server]") and the bundled Next.js dashboard. The event buffer that powers them is disabled by default, so the SDK stays zero-overhead until you opt in.

pip install "llm-rotate[server]"
llm-rotate-server --config llm-rotate.yaml --port 8200

The server exposes secret-masked, GET-only endpoints for health, per-key state, usage (overall and per-model), recent events, time-series, and sanitized config — all with optional time-frame filters. See the Monitoring dashboard guide for the full endpoint reference, how to embed the server in an existing app, and how to run the dashboard UI.

Mapping to HTTP

If you expose llm-rotate behind an API, these are the conventional mappings used by the reference service:

Condition	HTTP status
Success	`200`
All keys / providers exhausted (`NoAvailableKeyError`)	`503`
Other runtime failure (`LMRotateError`)	`502`
Bad configuration (`ConfigurationError`)	`500` (startup)

See Errors for the full exception hierarchy.

Usage logging#

OpenTelemetry tracing & metrics#

Health snapshot#

Usage summary#