Google multimodal

Use generate_content for PDF, image, and file inputs on Google AI Studio and Vertex AI.

In addition to the unified chat() API, llm-rotate exposes generate_content() — a thin, rotation-aware wrapper over Google's generate_content for the google_ai_studio and google_vertex providers. Use it when you need Gemini's multimodal features (PDFs, images, files) or Google-specific generation controls.

Google providers only

generate_content() is specific to the two Google providers. For provider-agnostic calls — including simple image inputs on OpenAI/Anthropic — use chat().

Basic usage

from llm_rotate import lm
 
resp = await lm.generate_content(
    "gemini-2.0-flash",
    ["Extract the key findings from this paper as JSON."],
    system_instruction="You are a meticulous research assistant.",
    response_mime_type="application/json",
    provider="google_vertex",
)
print(resp.content)

It returns the same ChatResponse type as chat().

Content parts

contents accepts a list of items. Strings are treated as text; for binary inputs use a ContentPart:

`type`	Fields	Use
`text`	`text`	Plain text.
`pdf_bytes`	`data`, `mime_type`	Inline PDF document.
`image_bytes`	`data`, `mime_type`	Inline image.
`file`	`file_uri`, `mime_type`	A previously uploaded file URI.

resp = await lm.generate_content(
    "gemini-2.0-flash",
    [
        "Summarise this document.",
        {"type": "pdf_bytes", "data": pdf_bytes, "mime_type": "application/pdf"},
    ],
    provider="google_vertex",
)

Generation controls

generate_content() surfaces the common Gemini knobs:

Parameter	Purpose
`system_instruction`	System prompt.
`response_mime_type`	e.g. `application/json` for structured output.
`max_output_tokens`	Output cap.
`temperature`, `top_p`, `seed`	Sampling controls.
`thinking_budget`	Reasoning-token budget (where supported).
`disable_automatic_function_calling`	Defaults to `True`.
`max_retries`	Per-call retry override.
`provider`	Pin `google_ai_studio` or `google_vertex`.

Streaming

generate_content_stream() mirrors chat_stream() and yields StreamChunks:

async for chunk in lm.generate_content_stream(
    "gemini-2.0-flash",
    ["Write a long explanation of attention."],
    provider="google_vertex",
):
    print(chunk.delta, end="", flush=True)

Basic usage#

Content parts#

Generation controls#

Streaming#

Basic usage

Content parts

Generation controls

Streaming