llm-rotate
A Python library for resilient LLM calls across providers — unified chat, automatic key rotation, health-aware selection, and cross-provider fallback.
llm-rotate makes LLM calls resilient by default. Point it at a pool of API
keys across one or more providers and it handles the messy parts — rate limits,
auth failures, transient 5xxs — by rotating keys, quarantining bad ones, and
falling back to other providers, all behind a single chat() call.
Why it exists
Calling an LLM provider directly is fine until it isn't: a key hits its rate
limit, an account runs out of quota, or a region has a bad few minutes. In
production you end up writing the same retry-and-rotate glue over and over.
llm-rotate is that glue, hardened and reusable.
- One API, seven providers. OpenAI, Anthropic, Google AI Studio, Google
Vertex AI, OpenRouter, Azure OpenAI, and AWS Bedrock — behind the same
chat()method. - Automatic key rotation. On rate-limit, auth, or transient failures it moves to the next healthy key for the provider.
- Health-aware selection. Keys carry a health state machine (active → cooldown → quarantine) so failing keys are skipped until they recover.
- Cross-provider fallback. When every key for a provider is exhausted, an optional fallback chain re-tries the request against another provider.
- Streaming, sync, and async. Plus a Google-specific
generate_contentfor multimodal (PDF, images, files). - Structured usage logging. Every call emits a record with tokens, latency, and a masked key id — with optional OpenTelemetry spans and a monitoring dashboard.
- Scales out cleanly. An optional Redis state store shares key health across workers; everything beyond the core is opt-in via install extras, so the everyday API stays small.
At a glance
from llm_rotate import configure, lm
configure(
registry={
"keys": [
{
"key_id": "openai-1",
"provider": "openai",
"secret_ref": "env://OPENAI_API_KEY",
"models": ["gpt-4o-mini"],
},
{
"key_id": "gemini-1",
"provider": "google_ai_studio",
"secret_ref": "env://GOOGLE_API_KEY",
"models": ["gemini-2.0-flash"],
},
]
},
use_keys=["openai-1", "gemini-1"],
)
response = await lm.chat("gpt-4o-mini", [{"role": "user", "content": "Hello"}])
print(response.content)llm-rotate is currently alpha (0.3.x). The public API described in these
docs is stable for v1. New capabilities — distributed state, tracing, the
control server, and extra providers — ship as opt-in
install extras so the core stays
lean. See the roadmap for what's next.
Where to next
- New here? Start with Installation and the Quickstart.
- Wiring it into a real app? Read Configuration and Providers.
- Want the details? See Key selection & rotation and the API reference.