Skip to main content

/generateContent

Use LiteLLM to call Google AI's generateContent endpoints for text generation, multimodal interactions, and streaming responses.

Overview​

FeatureSupportedNotes
Cost Tracking✅
Logging✅works across all integrations
End-user Tracking✅
Streaming✅
Fallbacks✅between supported models
Loadbalancing✅between supported models
Metadata Tracking✅passes trace ID, metadata to observability callbacks (e.g. S3, Langfuse)

Usage​


LiteLLM Python SDK​

Non-streaming example​

Basic Text Generation
from litellm.google_genai import agenerate_content
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
parts=[
PartDict(text="Hello, can you tell me a short joke?")
],
role="user",
)

response = await agenerate_content(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=100,
)
print(response)

Streaming example​

Streaming Text Generation
from litellm.google_genai import agenerate_content_stream
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
parts=[
PartDict(text="Write a long story about space exploration")
],
role="user",
)

response = await agenerate_content_stream(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=500,
)

async for chunk in response:
print(chunk)

LiteLLM Proxy Server​

  1. Setup config.yaml
model_list:
- model_name: gemini-flash
litellm_params:
model: gemini/gemini-2.0-flash
api_key: os.environ/GEMINI_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
Google GenAI SDK with LiteLLM Proxy
from google.genai import Client
import os

# Configure Google GenAI SDK to use LiteLLM proxy
os.environ["GOOGLE_GEMINI_BASE_URL"] = "http://localhost:4000"
os.environ["GEMINI_API_KEY"] = "sk-1234"

client = Client()

response = client.models.generate_content(
model="gemini-flash",
contents=[
{
"parts": [{"text": "Write a short story about AI"}],
"role": "user"
}
],
config={"max_output_tokens": 100}
)

Native request fields​

The generateContent endpoint is a drop-in for Google's Generative Language REST API, so the top-level fields that Google's GenerateContentRequest carries as siblings of generationConfig are forwarded to Google verbatim. This covers safetySettings, toolConfig, cachedContent, and labels. Send them at the top level of the request body exactly as you would when calling Google directly; there is no need to wrap them in extra_body. If you do pass extra_body, an explicit value there wins on conflict.

Native top-level fields via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:generateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
"contents": [
{
"parts": [{"text": "Say hi"}],
"role": "user"
}
],
"generationConfig": {
"maxOutputTokens": 100
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE"
}
],
"toolConfig": {
"functionCallingConfig": {"mode": "AUTO"}
}
}'
🚅
LiteLLM Enterprise
SSO/SAML, audit logs, spend tracking, multi-team management, and guardrails — built for production.
Learn more →