Research CommonsResearch Commons
llm-rotate/Overview

llm-rotate

A Python library for resilient LLM calls across providers — unified chat, automatic key rotation, health-aware selection, and cross-provider fallback.

llm-rotate makes LLM calls resilient by default. Point it at a pool of API keys across one or more providers and it handles the messy parts — rate limits, auth failures, transient 5xxs — by rotating keys, quarantining bad ones, and falling back to other providers, all behind a single chat() call.

Why it exists

Calling an LLM provider directly is fine until it isn't: a key hits its rate limit, an account runs out of quota, or a region has a bad few minutes. In production you end up writing the same retry-and-rotate glue over and over. llm-rotate is that glue, hardened and reusable.

  • One API, seven providers. OpenAI, Anthropic, Google AI Studio, Google Vertex AI, OpenRouter, Azure OpenAI, and AWS Bedrock — behind the same chat() method.
  • Automatic key rotation. On rate-limit, auth, or transient failures it moves to the next healthy key for the provider.
  • Health-aware selection. Keys carry a health state machine (active → cooldown → quarantine) so failing keys are skipped until they recover.
  • Cross-provider fallback. When every key for a provider is exhausted, an optional fallback chain re-tries the request against another provider.
  • Streaming, sync, and async. Plus a Google-specific generate_content for multimodal (PDF, images, files).
  • Structured usage logging. Every call emits a record with tokens, latency, and a masked key id — with optional OpenTelemetry spans and a monitoring dashboard.
  • Scales out cleanly. An optional Redis state store shares key health across workers; everything beyond the core is opt-in via install extras, so the everyday API stays small.

At a glance

from llm_rotate import configure, lm
 
configure(
    registry={
        "keys": [
            {
                "key_id": "openai-1",
                "provider": "openai",
                "secret_ref": "env://OPENAI_API_KEY",
                "models": ["gpt-4o-mini"],
            },
            {
                "key_id": "gemini-1",
                "provider": "google_ai_studio",
                "secret_ref": "env://GOOGLE_API_KEY",
                "models": ["gemini-2.0-flash"],
            },
        ]
    },
    use_keys=["openai-1", "gemini-1"],
)
 
response = await lm.chat("gpt-4o-mini", [{"role": "user", "content": "Hello"}])
print(response.content)
Status

llm-rotate is currently alpha (0.3.x). The public API described in these docs is stable for v1. New capabilities — distributed state, tracing, the control server, and extra providers — ship as opt-in install extras so the core stays lean. See the roadmap for what's next.

Where to next