How Model Access Works
Concept​
Each model onboarded is a "model deployment" in LiteLLM.
These model deployments are assigned to a "model group", via the "model_name" field in the config.yaml.
Example​
model_list:
- model_name: my-custom-model
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
In here, we onboard a model deployment for the model gpt-4o and assign it to the model group my-custom-model.
Client-side request​
Here's what a client-side request looks like:
curl --location 'http://localhost:4000/chat/completions' \
-H 'Authorization: Bearer <your-api-key>' \
-H 'Content-Type: application/json' \
-d '{"model": "my-custom-model", "messages": [{"role": "user", "content": "Hello, how are you?"}]}'
Access Control​
When you give access to a key/user/team, you are giving them access to a "model group".
Example:
curl --location 'http://localhost:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["my-custom-model"]}'
Loadbalancing​
You can add multiple model deployments to a single "model group". LiteLLM will automatically load balance requests across the model deployments in the group.
Example:
model_list:
- model_name: my-custom-model
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: my-custom-model
litellm_params:
model: azure/gpt-4o
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
api_version: os.environ/AZURE_API_VERSION
This way, you can maximize your rate limits across multiple model deployments.
Fallbacks​
You can fallback across model groups. This is useful, if all "model deployments" in a "model group" are down (e.g. raising 429 errors).
Example:
model_list:
- model_name: my-custom-model
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
- model_name: my-other-model
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
fallbacks: [{"my-custom-model": ["my-other-model"]}]
Fallbacks are done sequentially, so the first model group in the list will be tried first. If it fails, the next model group will be tried.