Streaming & sync

chat() is async and returns a complete response. llm-rotate also provides a streaming variant and a synchronous wrapper for the same rotation and fallback behaviour.

Streaming

chat_stream() is an async generator yielding StreamChunks. Each chunk carries an incremental delta; the final chunk carries finish_reason and usage.

from llm_rotate import lm
 
chunks = []
async for chunk in lm.chat_stream(
    "gpt-4o-mini",
    [{"role": "user", "content": "Stream me a limerick."}],
):
    print(chunk.delta, end="", flush=True)
    chunks.append(chunk)
 
text = "".join(c.delta for c in chunks)

Key rotation still applies: if the chosen key fails before the stream starts, llm-rotate rotates and retries. Once bytes are flowing, an interruption surfaces as a stream_interrupted error rather than silently restarting.

Synchronous calls

For code that isn't async, chat_sync() runs the async path to completion and returns a ChatResponse:

response = lm.chat_sync(
    "gpt-4o-mini",
    [{"role": "user", "content": "Hello"}],
)
print(response.content)

Don't call chat_sync inside a running loop

chat_sync() is for synchronous contexts. Calling it from inside an already running event loop will error — use await chat() there instead.

Multipart messages

A message's content can be a string or a list of parts, which lets you mix text with images on providers that support it:

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_bytes", "data": image_bytes, "mime_type": "image/png"},
        ],
    }
]
response = await lm.chat("gpt-4o-mini", messages)

For Google's richer multimodal API (PDFs, files), see Google multimodal.

Streaming#

Synchronous calls#

Multipart messages#

Streaming

Synchronous calls

Multipart messages