/generateContent
Use LiteLLM to call Google AI's generateContent endpoints for text generation, multimodal interactions, and streaming responses.
Overview​
| Feature | Supported | Notes |
|---|---|---|
| Cost Tracking | ✅ | |
| Logging | ✅ | works across all integrations |
| End-user Tracking | ✅ | |
| Streaming | ✅ | |
| Fallbacks | ✅ | between supported models |
| Loadbalancing | ✅ | between supported models |
| Metadata Tracking | ✅ | passes trace ID, metadata to observability callbacks (e.g. S3, Langfuse) |
Usage​
LiteLLM Python SDK​
- Basic Usage
- Sync Usage
Non-streaming example​
Basic Text Generation
from litellm.google_genai import agenerate_content
from google.genai.types import ContentDict, PartDict
import os
# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
contents = ContentDict(
parts=[
PartDict(text="Hello, can you tell me a short joke?")
],
role="user",
)
response = await agenerate_content(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=100,
)
print(response)
Streaming example​
Streaming Text Generation
from litellm.google_genai import agenerate_content_stream
from google.genai.types import ContentDict, PartDict
import os
# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
contents = ContentDict(
parts=[
PartDict(text="Write a long story about space exploration")
],
role="user",
)
response = await agenerate_content_stream(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=500,
)
async for chunk in response:
print(chunk)
Sync non-streaming example​
Sync Text Generation
from litellm.google_genai import generate_content
from google.genai.types import ContentDict, PartDict
import os
# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
contents = ContentDict(
parts=[
PartDict(text="Hello, can you tell me a short joke?")
],
role="user",
)
response = generate_content(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=100,
)
print(response)
Sync streaming example​
Sync Streaming Text Generation
from litellm.google_genai import generate_content_stream
from google.genai.types import ContentDict, PartDict
import os
# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
contents = ContentDict(
parts=[
PartDict(text="Write a long story about space exploration")
],
role="user",
)
response = generate_content_stream(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=500,
)
for chunk in response:
print(chunk)
LiteLLM Proxy Server​
- Setup config.yaml
model_list:
- model_name: gemini-flash
litellm_params:
model: gemini/gemini-2.0-flash
api_key: os.environ/GEMINI_API_KEY
- Start proxy
litellm --config /path/to/config.yaml
- Test it!
- Google GenAI SDK
- curl
Google GenAI SDK with LiteLLM Proxy
from google.genai import Client
import os
# Configure Google GenAI SDK to use LiteLLM proxy
os.environ["GOOGLE_GEMINI_BASE_URL"] = "http://localhost:4000"
os.environ["GEMINI_API_KEY"] = "sk-1234"
client = Client()
response = client.models.generate_content(
model="gemini-flash",
contents=[
{
"parts": [{"text": "Write a short story about AI"}],
"role": "user"
}
],
config={"max_output_tokens": 100}
)
Generate Content​
generateContent via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:generateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
"contents": [
{
"parts": [
{
"text": "Write a short story about AI"
}
],
"role": "user"
}
],
"generationConfig": {
"maxOutputTokens": 100
}
}'
Stream Generate Content​
streamGenerateContent via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:streamGenerateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
"contents": [
{
"parts": [
{
"text": "Write a long story about space exploration"
}
],
"role": "user"
}
],
"generationConfig": {
"maxOutputTokens": 500
}
}'
Native request fields​
The generateContent endpoint is a drop-in for Google's Generative Language REST API, so the top-level fields that Google's GenerateContentRequest carries as siblings of generationConfig are forwarded to Google verbatim. This covers safetySettings, toolConfig, cachedContent, and labels. Send them at the top level of the request body exactly as you would when calling Google directly; there is no need to wrap them in extra_body. If you do pass extra_body, an explicit value there wins on conflict.
- curl
- LiteLLM Python SDK
Native top-level fields via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:generateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
"contents": [
{
"parts": [{"text": "Say hi"}],
"role": "user"
}
],
"generationConfig": {
"maxOutputTokens": 100
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE"
}
],
"toolConfig": {
"functionCallingConfig": {"mode": "AUTO"}
}
}'
Native top-level fields via LiteLLM Python SDK
from litellm.google_genai import generate_content
import os
# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
response = generate_content(
model="gemini/gemini-2.0-flash",
contents=[{"role": "user", "parts": [{"text": "Say hi"}]}],
safetySettings=[
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"}
],
toolConfig={"functionCallingConfig": {"mode": "AUTO"}},
)
print(response)