Skip to main content

Fallbacks

If a call fails after num_retries, fallback to another model group.

Fallbacks are typically done from one model_name to another model_name.

Quick Start​

1. Setup fallbacks​

Key change:

fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}]
from litellm import Router 
router = Router(
model_list=[
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "azure/<your-deployment-name>",
"api_base": "<your-azure-endpoint>",
"api_key": "<your-azure-api-key>",
"rpm": 6
}
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4-ca",
"api_base": "https://my-endpoint-canada-berri992.openai.azure.com/",
"api_key": "<your-azure-api-key>",
"rpm": 6
}
}
],
fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}] # 👈 KEY CHANGE
)

2. Start Proxy​

litellm --config /path/to/config.yaml

3. Test Fallbacks​

Pass mock_testing_fallbacks=true in request body, to trigger fallbacks.


from litellm import Router

model_list = [{..}, {..}] # defined in Step 1.

router = Router(model_list=model_list, fallbacks=[{"bad-model": ["my-good-model"]}])

response = router.completion(
model="bad-model",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True,
)

Explanation​

Fallbacks are done in-order - ["gpt-3.5-turbo, "gpt-4", "gpt-4-32k"], will do 'gpt-3.5-turbo' first, then 'gpt-4', etc.

You can also set default_fallbacks, in case a specific model group is misconfigured / bad.

There are 3 types of fallbacks:

  • content_policy_fallbacks: For litellm.ContentPolicyViolationError - LiteLLM maps content policy violation errors across providers See Code
  • context_window_fallbacks: For litellm.ContextWindowExceededErrors - LiteLLM maps context window error messages across providers See Code
  • fallbacks: For all remaining errors - e.g. litellm.RateLimitError

Client Side Fallbacks​

Set fallbacks in the .completion() call for SDK and client-side for proxy.

In this request the following will occur:

  1. The request to model="zephyr-beta" will fail
  2. litellm proxy will loop through all the model_groups specified in fallbacks=["gpt-3.5-turbo"]
  3. The request to model="gpt-3.5-turbo" will succeed and the client making the request will get a response from gpt-3.5-turbo

👉 Key Change: "fallbacks": ["gpt-3.5-turbo"]

from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True, # 👈 trigger fallbacks
fallbacks=[
{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}],
}
],
)

print(resp)

Control Fallback Prompts​

Pass in messages/temperature/etc. per model in fallback (works for embedding/image generation/etc. as well).

Key Change:

fallbacks = [
{
"model": <model_name>,
"messages": <model-specific-messages>
... # any other model-specific parameters
}
]
from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True, # 👈 trigger fallbacks
fallbacks=[
{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}],
}
],
)

print(resp)

Content Policy Violation Fallback​

Key change:

content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]
from litellm import Router 

router = Router(
model_list=[
{
"model_name": "claude-2",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": Exception("content filtering policy"),
},
},
{
"model_name": "my-fallback-model",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": "This works!",
},
},
],
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
# fallbacks=[..], # [OPTIONAL]
# context_window_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
model="claude-2",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

Context Window Exceeded Fallback​

Key change:

context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]
from litellm import Router 

router = Router(
model_list=[
{
"model_name": "claude-2",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": Exception("prompt is too long"),
},
},
{
"model_name": "my-fallback-model",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": "This works!",
},
},
],
context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
# fallbacks=[..], # [OPTIONAL]
# content_policy_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
model="claude-2",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

Advanced​

Fallbacks + Retries + Timeouts + Cooldowns​

To set fallbacks, just do:

litellm_settings:
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]

Covers all errors (429, 500, etc.)

Set via config

model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <my-openai-key>
- model_name: gpt-3.5-turbo-16k
litellm_params:
model: gpt-3.5-turbo-16k
api_key: <my-openai-key>

litellm_settings:
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
cooldown_time: 30 # how long to cooldown model if fails/min > allowed_fails

Fallback to Specific Model ID​

If all models in a group are in cooldown (e.g. rate limited), LiteLLM will fallback to the model with the specific model ID.

This skips any cooldown check for the fallback model.

  1. Specify the model ID in model_info
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
model_info:
id: my-specific-model-id # 👈 KEY CHANGE
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
- model_name: anthropic-claude
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

Note: This will only fallback to the model with the specific model ID. If you want to fallback to another model group, you can set fallbacks=[{"gpt-4": ["anthropic-claude"]}]

  1. Set fallbacks in config
litellm_settings:
fallbacks: [{"gpt-4": ["my-specific-model-id"]}]
  1. Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true
}'

Validate it works, by checking the response header x-litellm-model-id

x-litellm-model-id: my-specific-model-id

Test Fallbacks!​

Check if your fallbacks are working as expected.

Regular Fallbacks​

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'

Content Policy Fallbacks​

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_content_policy_fallbacks": true # 👈 KEY CHANGE
}
'

Context Window Fallbacks​

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_context_window_fallbacks": true # 👈 KEY CHANGE
}
'

Context Window Fallbacks (Pre-Call Checks + Fallbacks)​

Before call is made check if a call is within model context window with enable_pre_call_checks: true.

See Code

1. Setup config

For azure deployments, set the base model. Pick the base model from this list, all the azure models start with azure/.

Filter older instances of a model (e.g. gpt-3.5-turbo) with smaller context windows

router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL

- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Test it!

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

text = "What is the meaning of 42?" * 5000

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{"role": "system", "content": text},
{"role": "user", "content": "Who was Alexander?"},
],
)

print(response)

Content Policy Fallbacks​

Fallback across providers (e.g. from Azure OpenAI to Anthropic) if you hit content policy violation errors.

model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"

- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}]

Default Fallbacks​

You can also set default_fallbacks, in case a specific model group is misconfigured / bad.

model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"

- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
default_fallbacks: ["claude-opus"]

This will default to claude-opus in case any model fails.

A model-specific fallbacks (e.g. {"gpt-3.5-turbo-small": ["claude-opus"]}) overrides default fallback.

EU-Region Filtering (Pre-Call Checks)​

Before call is made check if a call is within model context window with enable_pre_call_checks: true.

Set 'region_name' of deployment.

Note: LiteLLM can automatically infer region_name for Vertex AI, Bedrock, and IBM WatsonxAI based on your litellm params. For Azure, set litellm.enable_preview = True.

1. Set Config

router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
region_name: "eu" # 👈 SET EU-REGION

- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY

- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-pro-1.5
vertex_project: adroit-crow-1234
vertex_location: us-east1 # 👈 AUTOMATICALLY INFERS 'region_name'

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Test it!

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.with_raw_response.create(
model="gpt-3.5-turbo",
messages = [{"role": "user", "content": "Who was Alexander?"}]
)

print(response)

print(f"response.headers.get('x-litellm-model-api-base')")

Setting Fallbacks for Wildcard Models​

You can set fallbacks for wildcard models (e.g. azure/*) in your config file.

  1. Setup config
model_list:
- model_name: "gpt-4o"
litellm_params:
model: "openai/gpt-4o"
api_key: os.environ/OPENAI_API_KEY
- model_name: "azure/*"
litellm_params:
model: "azure/*"
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE

litellm_settings:
fallbacks: [{"gpt-4o": ["azure/gpt-4o"]}]
  1. Start Proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "what color is red"
}
]
}
],
"max_tokens": 300,
"mock_testing_fallbacks": true
}'

Disable Fallbacks per key​

You can disable fallbacks per key by setting disable_fallbacks: true in your key metadata.

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"disable_fallbacks": true
}
}'