Skip to main content

One post tagged with "admin ui"

View All Tags

Using Gemini 3 on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM
info

This guide covers common questions and best practices for using gemini-3-pro-preview with LiteLLM Proxy and SDK.

Quick Start

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = "your-api-key"

response = completion(
model="gemini/gemini-3-pro-preview",
messages=[{"role": "user", "content": "Hello!"}],
reasoning_effort="low"
)

print(response.choices[0].message.content)

Supported Endpoints

LiteLLM provides full end-to-end support for Gemini 3 Pro Preview on:

  • /v1/chat/completions - OpenAI-compatible chat completions endpoint
  • /v1/messages - Anthropic-compatible messages endpoint

Both endpoints support:

  • Streaming and non-streaming responses
  • Function calling with thought signatures
  • Multi-turn conversations
  • All Gemini 3-specific features

Thought Signatures

What are Thought Signatures?

Thought signatures are encrypted representations of the model's internal reasoning process. They're essential for maintaining context across multi-turn conversations, especially with function calling.

How Thought Signatures Work

  1. Automatic Extraction: When Gemini 3 returns a function call, LiteLLM automatically extracts the thought_signature from the response
  2. Storage: Thought signatures are stored in provider_specific_fields.thought_signature of tool calls
  3. Automatic Preservation: When you include the assistant's message in conversation history, LiteLLM automatically preserves and returns thought signatures to Gemini

Example: Multi-Turn Function Calling

Streaming with Thought Signatures

When using streaming mode with stream_chunk_builder(), thought signatures are now automatically preserved:

import os
import litellm
from litellm import completion

os.environ["GEMINI_API_KEY"] = "your-api-key"

MODEL = "gemini/gemini-3-pro-preview"

messages = [
{"role": "system", "content": "You are a helpful assistant. Use the calculate tool."},
{"role": "user", "content": "What is 2+2?"},
]

tools = [{
"type": "function",
"function": {
"name": "calculate",
"description": "Calculate a mathematical expression",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
},
}]

print("Step 1: Sending request with stream=True...")
response = completion(
model=MODEL,
messages=messages,
stream=True,
tools=tools,
reasoning_effort="low"
)

# Collect all chunks
chunks = []
for part in response:
chunks.append(part)

# Reconstruct message using stream_chunk_builder
# Thought signatures are now preserved automatically!
full_response = litellm.stream_chunk_builder(chunks, messages=messages)
print(f"Full response: {full_response}")

assistant_msg = full_response.choices[0].message

# ✅ Thought signature is now preserved in provider_specific_fields
if assistant_msg.tool_calls and assistant_msg.tool_calls[0].provider_specific_fields:
thought_sig = assistant_msg.tool_calls[0].provider_specific_fields.get("thought_signature")
print(f"Thought signature preserved: {thought_sig is not None}")

# Append assistant message (includes thought signatures automatically)
messages.append(assistant_msg)

# Mock tool execution
messages.append({
"role": "tool",
"content": "4",
"tool_call_id": assistant_msg.tool_calls[0].id
})

print("\nStep 2: Sending tool result back to model...")
response_2 = completion(
model=MODEL,
messages=messages,
stream=True,
tools=tools,
reasoning_effort="low"
)

for part in response_2:
if part.choices[0].delta.content:
print(part.choices[0].delta.content, end="")
print() # New line

Key Points:

  • stream_chunk_builder() now preserves provider_specific_fields including thought signatures
  • ✅ Thought signatures are automatically included when appending assistant_msg to conversation history
  • ✅ Multi-turn conversations work seamlessly with streaming

Important Notes on Thought Signatures

  1. Automatic Handling: LiteLLM automatically extracts and preserves thought signatures. You don't need to manually manage them.

  2. Parallel Function Calls: When the model makes parallel function calls, only the first function call has a thought signature.

  3. Sequential Function Calls: In multi-step function calling, each step's first function call has its own thought signature that must be preserved.

  4. Required for Context: Thought signatures are essential for maintaining reasoning context. Without them, the model may lose context of its previous reasoning.

Conversation History: Switching from Non-Gemini-3 Models

Common Question: Will switching from a non-Gemini-3 model to Gemini-3 break conversation history?

Answer: No! LiteLLM automatically handles this by adding dummy thought signatures when needed.

How It Works

When you switch from a model that doesn't use thought signatures (e.g., gemini-2.5-flash) to Gemini 3, LiteLLM:

  1. Detects missing signatures: Identifies assistant messages with tool calls that lack thought signatures
  2. Adds dummy signature: Automatically injects a dummy thought signature (skip_thought_signature_validator) for compatibility
  3. Maintains conversation flow: Your conversation history continues to work seamlessly

Example: Switching Models Mid-Conversation

from openai import OpenAI

client = OpenAI(api_key="sk-1234", base_url="http://localhost:4000")

# Step 1: Start with gemini-2.5-flash (no thought signatures)
messages = [{"role": "user", "content": "What's the weather?"}]

response1 = client.chat.completions.create(
model="gemini-2.5-flash",
messages=messages,
tools=[...],
reasoning_effort="low"
)

# Append assistant message (no tool call thought signature from gemini-2.5-flash)
messages.append(response1.choices[0].message)

# Step 2: Switch to gemini-3-pro-preview
# LiteLLM automatically adds dummy thought signature to the previous assistant message
response2 = client.chat.completions.create(
model="gemini-3-pro-preview", # 👈 Switched model
messages=messages, # 👈 Same conversation history
tools=[...],
reasoning_effort="low"
)

# ✅ Works seamlessly! No errors, no breaking changes
print(response2.choices[0].message.content)

Dummy Signature Details

The dummy signature used is: base64("skip_thought_signature_validator")

This is the recommended approach by Google for handling conversation history from models that don't support thought signatures. It allows Gemini 3 to:

  • Accept the conversation history without validation errors
  • Continue the conversation seamlessly
  • Maintain context across model switches

Thinking Level Parameter

How reasoning_effort Maps to thinking_level

For Gemini 3 Pro Preview, LiteLLM automatically maps reasoning_effort to the new thinking_level parameter:

reasoning_effortthinking_levelNotes
"minimal""low"Maps to low thinking level
"low""low"Default for most use cases
"medium""high"Medium not available yet, maps to high
"high""high"Maximum reasoning depth
"disable""low"Gemini 3 cannot fully disable thinking
"none""low"Gemini 3 cannot fully disable thinking

Default Behavior

If you don't specify reasoning_effort, LiteLLM automatically sets thinking_level="low" for Gemini 3 models, to avoid high costs.

Example Usage

from litellm import completion

# Low thinking level (faster, lower cost)
response = completion(
model="gemini/gemini-3-pro-preview",
messages=[{"role": "user", "content": "What's the weather?"}],
reasoning_effort="low" # Maps to thinking_level="low"
)

# High thinking level (deeper reasoning, higher cost)
response = completion(
model="gemini/gemini-3-pro-preview",
messages=[{"role": "user", "content": "Solve this complex math problem step by step."}],
reasoning_effort="high" # Maps to thinking_level="high"
)

Important Notes

  1. Gemini 3 Cannot Disable Thinking: Unlike Gemini 2.5 models, Gemini 3 cannot fully disable thinking. Even when you set reasoning_effort="none" or "disable", it maps to thinking_level="low".

  2. Temperature Recommendation: For Gemini 3 models, LiteLLM defaults temperature to 1.0 and strongly recommends keeping it at this default. Setting temperature < 1.0 can cause:

    • Infinite loops
    • Degraded reasoning performance
    • Failure on complex tasks
  3. Automatic Defaults: If you don't specify reasoning_effort, LiteLLM automatically sets thinking_level="low" for optimal performance.

Using with Claude Code CLI

You can use gemini-3-pro-preview with Claude Code CLI - Anthropic's command-line interface. This allows you to use Gemini 3 Pro Preview with Claude Code's native syntax and workflows.

Setup

1. Add Gemini 3 Pro Preview to your config.yaml:

model_list:
- model_name: gemini-3-pro-preview
litellm_params:
model: gemini/gemini-3-pro-preview
api_key: os.environ/GEMINI_API_KEY

litellm_settings:
master_key: os.environ/LITELLM_MASTER_KEY

2. Set environment variables:

export GEMINI_API_KEY="your-gemini-api-key"
export LITELLM_MASTER_KEY="sk-1234567890" # Generate a secure key

3. Start LiteLLM Proxy:

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

4. Configure Claude Code to use LiteLLM Proxy:

export ANTHROPIC_BASE_URL="http://0.0.0.0:4000"
export ANTHROPIC_AUTH_TOKEN="$LITELLM_MASTER_KEY"

5. Use Gemini 3 Pro Preview with Claude Code:

# Claude Code will use gemini-3-pro-preview from your LiteLLM proxy
claude --model gemini-3-pro-preview

Example Usage

Once configured, you can interact with Gemini 3 Pro Preview using Claude Code's native interface:

$ claude --model gemini-3-pro-preview
> Explain how thought signatures work in multi-turn conversations.

# Gemini 3 Pro Preview responds through Claude Code interface

Benefits

  • Native Claude Code Experience: Use Gemini 3 Pro Preview with Claude Code's familiar CLI interface
  • Unified Authentication: Single API key for all models through LiteLLM proxy
  • Cost Tracking: All usage tracked through LiteLLM's centralized logging
  • Seamless Model Switching: Easily switch between Claude and Gemini models
  • Full Feature Support: All Gemini 3 features (thought signatures, function calling, etc.) work through Claude Code

Troubleshooting

Claude Code not finding the model:

  • Ensure the model name in Claude Code matches exactly: gemini-3-pro-preview
  • Verify your proxy is running: curl http://0.0.0.0:4000/health
  • Check that ANTHROPIC_BASE_URL points to your LiteLLM proxy

Authentication errors:

  • Verify ANTHROPIC_AUTH_TOKEN matches your LiteLLM master key
  • Ensure GEMINI_API_KEY is set correctly
  • Check LiteLLM proxy logs for detailed error messages

Best Practices

1. Always Include Thought Signatures in Conversation History

When building multi-turn conversations with function calling:

Do:

# Append the full assistant message (includes thought signatures)
messages.append(response.choices[0].message)

Don't:

# Don't manually construct assistant messages without thought signatures
messages.append({
"role": "assistant",
"tool_calls": [...] # Missing thought signatures!
})

2. Use Appropriate Thinking Levels

  • reasoning_effort="low": For simple queries, quick responses, cost optimization
  • reasoning_effort="high": For complex problems requiring deep reasoning

3. Keep Temperature at Default

For Gemini 3 models, always use temperature=1.0 (default). Lower temperatures can cause issues.

4. Handle Model Switches Gracefully

When switching from non-Gemini-3 to Gemini-3:

  • ✅ LiteLLM automatically handles missing thought signatures
  • ✅ No manual intervention needed
  • ✅ Conversation history continues seamlessly

Troubleshooting

Issue: Missing Thought Signatures

Symptom: Error when including assistant messages in conversation history

Solution: Ensure you're appending the full assistant message from the response:

messages.append(response.choices[0].message)  # ✅ Includes thought signatures

Issue: Conversation Breaks When Switching Models

Symptom: Errors when switching from gemini-2.5-flash to gemini-3-pro-preview

Solution: This should work automatically! LiteLLM adds dummy signatures. If you see errors, ensure you're using the latest LiteLLM version.

Issue: Infinite Loops or Poor Performance

Symptom: Model gets stuck or produces poor results

Solution:

  • Ensure temperature=1.0 (default for Gemini 3)
  • Check that reasoning_effort is set appropriately
  • Verify you're using the correct model name: gemini/gemini-3-pro-preview

Additional Resources