Google multimodal
Use generate_content for PDF, image, and file inputs on Google AI Studio and Vertex AI.
In addition to the unified chat() API, llm-rotate exposes
generate_content() — a thin, rotation-aware wrapper over Google's
generate_content for the google_ai_studio and google_vertex providers. Use
it when you need Gemini's multimodal features (PDFs, images, files) or
Google-specific generation controls.
generate_content() is specific to the two Google providers. For
provider-agnostic calls — including simple image inputs on OpenAI/Anthropic —
use chat().
Basic usage
from llm_rotate import lm
resp = await lm.generate_content(
"gemini-2.0-flash",
["Extract the key findings from this paper as JSON."],
system_instruction="You are a meticulous research assistant.",
response_mime_type="application/json",
provider="google_vertex",
)
print(resp.content)It returns the same ChatResponse
type as chat().
Content parts
contents accepts a list of items. Strings are treated as text; for binary
inputs use a ContentPart:
type | Fields | Use |
|---|---|---|
text | text | Plain text. |
pdf_bytes | data, mime_type | Inline PDF document. |
image_bytes | data, mime_type | Inline image. |
file | file_uri, mime_type | A previously uploaded file URI. |
resp = await lm.generate_content(
"gemini-2.0-flash",
[
"Summarise this document.",
{"type": "pdf_bytes", "data": pdf_bytes, "mime_type": "application/pdf"},
],
provider="google_vertex",
)Generation controls
generate_content() surfaces the common Gemini knobs:
| Parameter | Purpose |
|---|---|
system_instruction | System prompt. |
response_mime_type | e.g. application/json for structured output. |
max_output_tokens | Output cap. |
temperature, top_p, seed | Sampling controls. |
thinking_budget | Reasoning-token budget (where supported). |
disable_automatic_function_calling | Defaults to True. |
max_retries | Per-call retry override. |
provider | Pin google_ai_studio or google_vertex. |
Streaming
generate_content_stream() mirrors
chat_stream() and yields
StreamChunks:
async for chunk in lm.generate_content_stream(
"gemini-2.0-flash",
["Write a long explanation of attention."],
provider="google_vertex",
):
print(chunk.delta, end="", flush=True)