Streaming & sync
Stream tokens as they arrive, or call synchronously from non-async code.
chat() is async and returns a complete response. llm-rotate also provides a
streaming variant and a synchronous wrapper for the same rotation and fallback
behaviour.
Streaming
chat_stream() is an async generator yielding
StreamChunks. Each chunk carries
an incremental delta; the final chunk carries finish_reason and usage.
from llm_rotate import lm
chunks = []
async for chunk in lm.chat_stream(
"gpt-4o-mini",
[{"role": "user", "content": "Stream me a limerick."}],
):
print(chunk.delta, end="", flush=True)
chunks.append(chunk)
text = "".join(c.delta for c in chunks)Key rotation still applies: if the chosen key fails before the stream
starts, llm-rotate rotates and retries. Once bytes are flowing, an
interruption surfaces as a stream_interrupted error rather than silently
restarting.
Synchronous calls
For code that isn't async, chat_sync() runs the async path to completion and
returns a ChatResponse:
response = lm.chat_sync(
"gpt-4o-mini",
[{"role": "user", "content": "Hello"}],
)
print(response.content)chat_sync() is for synchronous contexts. Calling it from inside an already
running event loop will error — use await chat() there instead.
Multipart messages
A message's content can be a string or a list of parts, which lets you mix
text with images on providers that support it:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_bytes", "data": image_bytes, "mime_type": "image/png"},
],
}
]
response = await lm.chat("gpt-4o-mini", messages)For Google's richer multimodal API (PDFs, files), see Google multimodal.