/generateContent

Use LiteLLM to call Google AI's generateContent endpoints for text generation, multimodal interactions, and streaming responses.

Overview

Feature	Supported	Notes
Cost Tracking	✅
Logging	✅	works across all integrations
End-user Tracking	✅
Streaming	✅
Fallbacks	✅	between supported models
Loadbalancing	✅	between supported models
Metadata Tracking	✅	passes trace ID, metadata to observability callbacks (e.g. S3, Langfuse)

Usage

LiteLLM Python SDK

Basic Usage
Sync Usage

Non-streaming example

Basic Text Generation
from litellm.google_genai import agenerate_content
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
    parts=[
        PartDict(text="Hello, can you tell me a short joke?")
    ],
    role="user",
)

response = await agenerate_content(
    contents=contents,
    model="gemini/gemini-2.0-flash",
    max_tokens=100,
)
print(response)

Streaming example

Streaming Text Generation
from litellm.google_genai import agenerate_content_stream
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
    parts=[
        PartDict(text="Write a long story about space exploration")
    ],
    role="user",
)

response = await agenerate_content_stream(
    contents=contents,
    model="gemini/gemini-2.0-flash",
    max_tokens=500,
)

async for chunk in response:
    print(chunk)

Sync non-streaming example

Sync Text Generation
from litellm.google_genai import generate_content
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
    parts=[
        PartDict(text="Hello, can you tell me a short joke?")
    ],
    role="user",
)

response = generate_content(
    contents=contents,
    model="gemini/gemini-2.0-flash",
    max_tokens=100,
)
print(response)

Sync streaming example

Sync Streaming Text Generation
from litellm.google_genai import generate_content_stream
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
    parts=[
        PartDict(text="Write a long story about space exploration")
    ],
    role="user",
)

response = generate_content_stream(
    contents=contents,
    model="gemini/gemini-2.0-flash",
    max_tokens=500,
)

for chunk in response:
    print(chunk)

LiteLLM Proxy Server

Setup config.yaml

model_list:
    - model_name: gemini-flash
      litellm_params:
        model: gemini/gemini-2.0-flash
        api_key: os.environ/GEMINI_API_KEY

Start proxy

litellm --config /path/to/config.yaml

Test it!

Google GenAI SDK
curl

Google GenAI SDK with LiteLLM Proxy
from google.genai import Client
import os

# Configure Google GenAI SDK to use LiteLLM proxy
os.environ["GOOGLE_GEMINI_BASE_URL"] = "http://localhost:4000"
os.environ["GEMINI_API_KEY"] = "sk-1234"

client = Client()

response = client.models.generate_content(
    model="gemini-flash",
    contents=[
        {
            "parts": [{"text": "Write a short story about AI"}],
            "role": "user"
        }
    ],
    config={"max_output_tokens": 100}
)

Generate Content

generateContent via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:generateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
  "contents": [
    {
      "parts": [
        {
          "text": "Write a short story about AI"
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "maxOutputTokens": 100
  }
}'

Stream Generate Content

streamGenerateContent via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:streamGenerateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
  "contents": [
    {
      "parts": [
        {
          "text": "Write a long story about space exploration"
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "maxOutputTokens": 500
  }
}'

Native request fields

The generateContent endpoint is a drop-in for Google's Generative Language REST API, so the top-level fields that Google's GenerateContentRequest carries as siblings of generationConfig are forwarded to Google verbatim. This covers safetySettings, toolConfig, cachedContent, and labels. Send them at the top level of the request body exactly as you would when calling Google directly; there is no need to wrap them in extra_body. If you do pass extra_body, an explicit value there wins on conflict.

curl
LiteLLM Python SDK

Native top-level fields via LiteLLM Proxy
curl -L -X POST 'http://localhost:4000/v1beta/models/gemini-flash:generateContent' \
-H 'content-type: application/json' \
-H 'authorization: Bearer sk-1234' \
-d '{
  "contents": [
    {
      "parts": [{"text": "Say hi"}],
      "role": "user"
    }
  ],
  "generationConfig": {
    "maxOutputTokens": 100
  },
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_NONE"
    }
  ],
  "toolConfig": {
    "functionCallingConfig": {"mode": "AUTO"}
  }
}'

Native top-level fields via LiteLLM Python SDK
from litellm.google_genai import generate_content
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

response = generate_content(
    model="gemini/gemini-2.0-flash",
    contents=[{"role": "user", "parts": [{"text": "Say hi"}]}],
    safetySettings=[
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"}
    ],
    toolConfig={"functionCallingConfig": {"mode": "AUTO"}},
)
print(response)

Use LiteLLM with gemini-cli

Overview​

Usage​

LiteLLM Python SDK​

Non-streaming example​

Streaming example​

Sync non-streaming example​

Sync streaming example​

LiteLLM Proxy Server​

Generate Content​

Stream Generate Content​

Native request fields​

Related​