Research CommonsResearch Commons
llm-rotate/Google multimodal

Google multimodal

Use generate_content for PDF, image, and file inputs on Google AI Studio and Vertex AI.

In addition to the unified chat() API, llm-rotate exposes generate_content() — a thin, rotation-aware wrapper over Google's generate_content for the google_ai_studio and google_vertex providers. Use it when you need Gemini's multimodal features (PDFs, images, files) or Google-specific generation controls.

Google providers only

generate_content() is specific to the two Google providers. For provider-agnostic calls — including simple image inputs on OpenAI/Anthropic — use chat().

Basic usage

from llm_rotate import lm
 
resp = await lm.generate_content(
    "gemini-2.0-flash",
    ["Extract the key findings from this paper as JSON."],
    system_instruction="You are a meticulous research assistant.",
    response_mime_type="application/json",
    provider="google_vertex",
)
print(resp.content)

It returns the same ChatResponse type as chat().

Content parts

contents accepts a list of items. Strings are treated as text; for binary inputs use a ContentPart:

typeFieldsUse
texttextPlain text.
pdf_bytesdata, mime_typeInline PDF document.
image_bytesdata, mime_typeInline image.
filefile_uri, mime_typeA previously uploaded file URI.
resp = await lm.generate_content(
    "gemini-2.0-flash",
    [
        "Summarise this document.",
        {"type": "pdf_bytes", "data": pdf_bytes, "mime_type": "application/pdf"},
    ],
    provider="google_vertex",
)

Generation controls

generate_content() surfaces the common Gemini knobs:

ParameterPurpose
system_instructionSystem prompt.
response_mime_typee.g. application/json for structured output.
max_output_tokensOutput cap.
temperature, top_p, seedSampling controls.
thinking_budgetReasoning-token budget (where supported).
disable_automatic_function_callingDefaults to True.
max_retriesPer-call retry override.
providerPin google_ai_studio or google_vertex.

Streaming

generate_content_stream() mirrors chat_stream() and yields StreamChunks:

async for chunk in lm.generate_content_stream(
    "gemini-2.0-flash",
    ["Write a long explanation of attention."],
    provider="google_vertex",
):
    print(chunk.delta, end="", flush=True)