Day 0 Support: Claude Opus 4.6
LiteLLM now supports Claude Opus 4.6 on Day 0. Use it across Anthropic, Azure, Vertex AI, and Bedrock through the LiteLLM AI Gateway.
Docker Image​
docker pull ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6
Usage - Anthropic​
- LiteLLM Proxy
1. Setup config.yaml
model_list:
- model_name: claude-opus-4-6
litellm_params:
model: anthropic/claude-opus-4-6
api_key: os.environ/ANTHROPIC_API_KEY
2. Start the proxy
docker run -d \
-p 4000:4000 \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml
3. Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
Usage - Azure​
- LiteLLM Proxy
1. Setup config.yaml
model_list:
- model_name: claude-opus-4-6
litellm_params:
model: azure_ai/claude-opus-4-6
api_key: os.environ/AZURE_AI_API_KEY
api_base: os.environ/AZURE_AI_API_BASE # https://<resource>.services.ai.azure.com
2. Start the proxy
docker run -d \
-p 4000:4000 \
-e AZURE_AI_API_KEY=$AZURE_AI_API_KEY \
-e AZURE_AI_API_BASE=$AZURE_AI_API_BASE \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml
3. Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
Usage - Vertex AI​
- LiteLLM Proxy
1. Setup config.yaml
model_list:
- model_name: claude-opus-4-6
litellm_params:
model: vertex_ai/claude-opus-4-6
vertex_project: os.environ/VERTEX_PROJECT
vertex_location: us-east5
2. Start the proxy
docker run -d \
-p 4000:4000 \
-e VERTEX_PROJECT=$VERTEX_PROJECT \
-e GOOGLE_APPLICATION_CREDENTIALS=/app/credentials.json \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $(pwd)/credentials.json:/app/credentials.json \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml
3. Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
Usage - Bedrock​
- LiteLLM Proxy
1. Setup config.yaml
model_list:
- model_name: claude-opus-4-6
litellm_params:
model: bedrock/anthropic.claude-opus-4-6-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
2. Start the proxy
docker run -d \
-p 4000:4000 \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml
3. Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
Compaction​
Litellm supports enabling compaction for the new claude-opus-4-6.
Enabling Compaction​
To enable compaction, add the context_management parameter with the compact_20260112 edit type:
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "What is the weather in San Francisco?"
}
],
"context_management": {
"edits": [
{
"type": "compact_20260112"
}
]
},
"max_tokens": 100
}'
All the parameters supported for context_management by anthropic are supported and can be directly added. Litellm automatically adds the compact-2026-01-12 beta header in the request.
Response with Compaction Block​
The response will include the compaction summary in provider_specific_fields.compaction_blocks:
{
"id": "chatcmpl-a6c105a3-4b25-419e-9551-c800633b6cb2",
"created": 1770357619,
"model": "claude-opus-4-6",
"object": "chat.completion",
"choices": [
{
"finish_reason": "length",
"index": 0,
"message": {
"content": "I don't have access to real-time data, so I can't provide the current weather in San Francisco. To get up-to-date weather information, I'd recommend checking:\n\n- **Weather websites** like weather.com, accuweather.com, or wunderground.com\n- **Search engines** – just Google \"San Francisco weather\"\n- **Weather apps** on your phone (e.g., Apple Weather, Google Weather)\n- **National",
"role": "assistant",
"provider_specific_fields": {
"compaction_blocks": [
{
"type": "compaction",
"content": "Summary of the conversation: The user requested help building a web scraper..."
}
]
}
}
}
],
"usage": {
"completion_tokens": 100,
"prompt_tokens": 86,
"total_tokens": 186
}
}
Using Compaction Blocks in Follow-up Requests​
To continue the conversation with compaction, include the compaction block in the assistant message's provider_specific_fields:
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "How can I build a web scraper?"
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Certainly! To build a basic web scraper, you'll typically use a programming language like Python along with libraries such as `requests` (for fetching web pages) and `BeautifulSoup` (for parsing HTML). Here's a basic example:\n\n```python\nimport requests\nfrom bs4 import BeautifulSoup\n\nurl = 'https://example.com'\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'html.parser')\n\n# Extract and print all text\ntext = soup.get_text()\nprint(text)\n```\n\nLet me know what you're interested in scraping or if you need help with a specific website!"
}
],
"provider_specific_fields": {
"compaction_blocks": [
{
"type": "compaction",
"content": "Summary of the conversation: The user asked how to build a web scraper, and the assistant gave an overview using Python with requests and BeautifulSoup."
}
]
}
},
{
"role": "user",
"content": "How do I use it to scrape product prices?"
}
],
"context_management": {
"edits": [
{
"type": "compact_20260112"
}
]
},
"max_tokens": 100
}'
Streaming Support​
Compaction blocks are also supported in streaming mode. You'll receive:
compaction_startevent when a compaction block beginscompaction_deltaevents with the compaction content- The accumulated
compaction_blocksinprovider_specific_fields
Effort Levels​
Four effort levels available: low, medium, high (default), and max. Pass directly via the effort parameter:
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
],
"effort": "max"
}'
1M Token Context (Beta)​
Opus 4.6 supports 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens). LiteLLM supports cost calculations for 1M token contexts.
US-Only Inference​
Available at 1.1× token pricing. LiteLLM supports this pricing model.


