Research CommonsResearch Commons
llm-rotate/Key selection & rotation

Key selection & rotation

How llm-rotate picks a key, tracks its health, and rotates away from failing credentials.

The core job of llm-rotate is choosing a healthy key for every call and reacting when one fails. This page covers the selection strategies and the key health state machine.

Selection strategies

Set the strategy globally via configure(strategy=...) or in defaults.selection_strategy. The default is health_aware.

StrategyBehavior
health_aware (default)Scores candidates by priority, weight, recent failures, and route health. Avoids keys that have been failing.
round_robinRotates through a provider's keys in order.
priorityAlways picks the highest-priority available key.
weightedRandom selection proportional to each key's weight.
configure(
    registry={"keys": [...]},
    use_keys=[...],
    strategy="health_aware",
    priorities={"openai-primary": 10, "openai-backup": 1},
)

The health state machine

Every key carries a runtime health state. Transitions are driven by call outcomes and the cooldown/quarantine timers from configuration.

StateMeaning
ACTIVEHealthy and eligible for selection.
COOLDOWNTemporarily rested after a rate-limit; skipped until the cooldown elapses.
QUARANTINESidelined after repeated or auth failures; skipped until quarantine elapses.
PROBATIONRecovering — allowed back but watched closely.
DISABLEDTaken out of rotation entirely.

A typical lifecycle:

ACTIVE ──rate limit──▶ COOLDOWN ──timer──▶ PROBATION ──success──▶ ACTIVE
ACTIVE ──auth fail ×N─▶ QUARANTINE ──timer─▶ PROBATION
PROBATION ──failure──▶ QUARANTINE
  • A rate-limit moves the key to COOLDOWN for cooldown_seconds.
  • Repeated auth/permission failures move it to QUARANTINE for quarantine_seconds.
  • After max_consecutive_failures, the key is taken out of rotation.

What triggers rotation

When a call fails, llm-rotate classifies the error and decides whether to retry on another key. Retryable categories include:

  • rate_limit → cooldown, try next key
  • quota_exhausted → try next key
  • invalid_auth / permission_denied → quarantine, try next key
  • timeout, transient_server_error, connection_error → retry with backoff
  • model_unavailable, broker_route_unavailable → try next key / route

Non-retryable request errors (a malformed request, say) are surfaced immediately rather than burning through your key pool. See Errors for the full taxonomy.

Retries

Each call retries up to max_retries times (default 3) across keys, with exponential backoff and jitter. Override per call:

response = await lm.chat(
    "gpt-4o-mini",
    [{"role": "user", "content": "Hello"}],
    max_retries=5,
)

Proactive rate limiting

Beyond reacting to 429s, llm-rotate avoids them in two ways. Both are opt-in and zero-cost when unused:

  • Client-side budgets. Set rate_limit_rpm and/or rate_limit_tpm on a key and a sliding-window limiter skips that key once it would exceed its budget in the trailing 60 seconds — selection simply moves on to another key. Keys without budgets configured are never throttled.
  • Header-driven cooldowns. When a provider response carries rate-limit headers (retry-after, remaining-requests/tokens), the key is moved into COOLDOWN before it starts returning errors.

Advisory leases

Under concurrency, llm-rotate takes a short advisory lease on the key it selects so simultaneous requests spread across the pool instead of stampeding a single near-limit key. Leases are advisory only — if every key is leased, selection falls back to a leased key rather than failing, so throughput is never sacrificed. With the Redis backend, leases coordinate across workers.

When everything is exhausted

If every eligible key (and every fallback provider) is unavailable, the call raises NoAvailableKeyError, which includes a health report and the earliest time a key is expected to become available again.

Sharing state across workers

By default, health state lives in an in-memory store, so it resets on restart and isn't shared across processes. For multi-worker deployments, switch on the Redis state store so cooldowns, rotation, and leases coordinate across every worker.