β¨ Enterprise Features
To get a license, get in touch with us here
Features:
- Security
- β SSO for Admin UI
- β Audit Logs with retention policy
- β JWT-Auth
- β Control available public, private routes (Restrict certain endpoints on proxy)
- β Control available public, private routes
- β [BETA] AWS Key Manager v2 - Key Decryption
- β IP addressβbased access control lists
- β Track Request IP Address
- β Use LiteLLM keys/authentication on Pass Through Endpoints
- β Set Max Request Size / File Size on Requests
- β Enforce Required Params for LLM Requests (ex. Reject requests missing ["metadata"]["generation_name"])
- Customize Logging, Guardrails, Caching per project
- β Team Based Logging - Allow each team to use their own Langfuse Project / custom callbacks
- β Disable Logging for a Team - Switch off all logging for a team/project (GDPR Compliance)
- Spend Tracking & Data Exports
- Prometheus Metrics
- Control Guardrails per API Key
- Custom Branding
Audit Logsβ
Store Audit logs for Create, Update Delete Operations done on Teams
and Virtual Keys
Step 1 Switch on audit Logs
litellm_settings:
store_audit_logs: true
Start the litellm proxy with this config
Step 2 Test it - Create a Team
curl --location 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"max_budget": 2
}'
Step 3 Expected Log
{
"id": "e1760e10-4264-4499-82cd-c08c86c8d05b",
"updated_at": "2024-06-06T02:10:40.836420+00:00",
"changed_by": "109010464461339474872",
"action": "created",
"table_name": "LiteLLM_TeamTable",
"object_id": "82e725b5-053f-459d-9a52-867191635446",
"before_value": null,
"updated_values": {
"team_id": "82e725b5-053f-459d-9a52-867191635446",
"admins": [],
"members": [],
"members_with_roles": [
{
"role": "admin",
"user_id": "109010464461339474872"
}
],
"max_budget": 2.0,
"models": [],
"blocked": false
}
}
Tracking Spend for Custom Tagsβ
Requirements:
- Virtual Keys & a database should be set up, see virtual keys
Usage - /chat/completions requests with request tagsβ
- Set on Key
- Set on Team
- OpenAI Python v1.0.0+
- OpenAI JS
- Curl Request
- Langchain
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"tags": ["tag1", "tag2", "tag3"]
}
}
'
curl -L -X POST 'http://0.0.0.0:4000/team/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"tags": ["tag1", "tag2", "tag3"]
}
}
'
Set extra_body={"metadata": { }}
to metadata
you want to pass
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={
"metadata": {
"tags": ["model-anthropic-claude-v2.1", "app-ishaan-prod"] # π Key Change
}
}
)
print(response)
const openai = require('openai');
async function runOpenAI() {
const client = new openai.OpenAI({
apiKey: 'sk-1234',
baseURL: 'http://0.0.0.0:4000'
});
try {
const response = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [
{
role: 'user',
content: "this is a test request, write a short poem"
},
],
metadata: {
tags: ["model-anthropic-claude-v2.1", "app-ishaan-prod"] // π Key Change
}
});
console.log(response);
} catch (error) {
console.log("got this exception from server");
console.error(error);
}
}
// Call the asynchronous function
runOpenAI();
Pass metadata
as part of the request body
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
"metadata": {"tags": ["model-anthropic-claude-v2.1", "app-ishaan-prod"]}
}'
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",
model = "gpt-3.5-turbo",
temperature=0.1,
extra_body={
"metadata": {
"tags": ["model-anthropic-claude-v2.1", "app-ishaan-prod"]
}
}
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
Viewing Spend per tagβ
/spend/tags
Request Formatβ
curl -X GET "http://0.0.0.0:4000/spend/tags" \
-H "Authorization: Bearer sk-1234"
/spend/tags
Response Formatβ
[
{
"individual_request_tag": "model-anthropic-claude-v2.1",
"log_count": 6,
"total_spend": 0.000672
},
{
"individual_request_tag": "app-ishaan-local",
"log_count": 4,
"total_spend": 0.000448
},
{
"individual_request_tag": "app-ishaan-prod",
"log_count": 2,
"total_spend": 0.000224
}
]
Tracking Spend with custom metadataβ
Requirements:
- Virtual Keys & a database should be set up, see virtual keys
Usage - /chat/completions requests with special spend logs metadataβ
- Set on Key
- Set on Team
- OpenAI Python v1.0.0+
- OpenAI JS
- Curl Request
- Langchain
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"spend_logs_metadata": {
"hello": "world"
}
}
}
'
curl -L -X POST 'http://0.0.0.0:4000/team/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"spend_logs_metadata": {
"hello": "world"
}
}
}
'
Set extra_body={"metadata": { }}
to metadata
you want to pass
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={
"metadata": {
"spend_logs_metadata": {
"hello": "world"
}
}
}
)
print(response)
const openai = require('openai');
async function runOpenAI() {
const client = new openai.OpenAI({
apiKey: 'sk-1234',
baseURL: 'http://0.0.0.0:4000'
});
try {
const response = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [
{
role: 'user',
content: "this is a test request, write a short poem"
},
],
metadata: {
spend_logs_metadata: { // π Key Change
hello: "world"
}
}
});
console.log(response);
} catch (error) {
console.log("got this exception from server");
console.error(error);
}
}
// Call the asynchronous function
runOpenAI();
Pass metadata
as part of the request body
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
"metadata": {
"spend_logs_metadata": {
"hello": "world"
}
}
}'
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",
model = "gpt-3.5-turbo",
temperature=0.1,
extra_body={
"metadata": {
"spend_logs_metadata": {
"hello": "world"
}
}
}
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
Viewing Spend w/ custom metadataβ
/spend/logs
Request Formatβ
curl -X GET "http://0.0.0.0:4000/spend/logs?request_id=<your-call-id" \ # e.g.: chatcmpl-9ZKMURhVYSi9D6r6PJ9vLcayIK0Vm
-H "Authorization: Bearer sk-1234"
/spend/logs
Response Formatβ
[
{
"request_id": "chatcmpl-9ZKMURhVYSi9D6r6PJ9vLcayIK0Vm",
"call_type": "acompletion",
"metadata": {
"user_api_key": "88dc28d0f030c55ed4ab77ed8faf098196cb1c05df778539800c9f1243fe6b4b",
"user_api_key_alias": null,
"spend_logs_metadata": { # π LOGGED CUSTOM METADATA
"hello": "world"
},
"user_api_key_team_id": null,
"user_api_key_user_id": "116544810872468347480",
"user_api_key_team_alias": null
},
}
]
Enforce Required Params for LLM Requestsβ
Use this when you want to enforce all requests to include certain params. Example you need all requests to include the user
and ["metadata]["generation_name"]
params.
- Set on Config
- Set on Key
Step 1 Define all Params you want to enforce on config.yaml
This means ["user"]
and ["metadata]["generation_name"]
are required in all LLM Requests to LiteLLM
general_settings:
master_key: sk-1234
enforced_params:
- user
- metadata.generation_name
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"enforced_params": ["user", "metadata.generation_name"]
}'
Step 2 Verify if this works
- Invalid Request (No `user` passed)
- Invalid Request (No `metadata` passed)
- Valid Request
curl --location 'http://localhost:4000/chat/completions' \
--header 'Authorization: Bearer sk-5fmYeaUEbAMpwBNT-QpxyA' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "hi"
}
]
}'
Expected Response
{"error":{"message":"Authentication Error, BadRequest please pass param=user in request body. This is a required param","type":"auth_error","param":"None","code":401}}%
curl --location 'http://localhost:4000/chat/completions' \
--header 'Authorization: Bearer sk-5fmYeaUEbAMpwBNT-QpxyA' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"user": "gm",
"messages": [
{
"role": "user",
"content": "hi"
}
],
"metadata": {}
}'
Expected Response
{"error":{"message":"Authentication Error, BadRequest please pass param=[metadata][generation_name] in request body. This is a required param","type":"auth_error","param":"None","code":401}}%
curl --location 'http://localhost:4000/chat/completions' \
--header 'Authorization: Bearer sk-5fmYeaUEbAMpwBNT-QpxyA' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"user": "gm",
"messages": [
{
"role": "user",
"content": "hi"
}
],
"metadata": {"generation_name": "prod-app"}
}'
Expected Response
{"id":"chatcmpl-9XALnHqkCBMBKrOx7Abg0hURHqYtY","choices":[{"finish_reason":"stop","index":0,"message":{"content":"Hello! How can I assist you today?","role":"assistant"}}],"created":1717691639,"model":"gpt-3.5-turbo-0125","object":"chat.completion","system_fingerprint":null,"usage":{"completion_tokens":9,"prompt_tokens":8,"total_tokens":17}}%
Control available public, private routesβ
Restrict certain endpoints of proxy
β Use this when you want to:
- make an existing private route -> public
- set certain routes as admin_only routes
Usage - Define public, admin only routesβ
Step 1 - Set on config.yaml
Route Type | Optional | Requires Virtual Key Auth | Admin Can Access | All Roles Can Access | Description |
---|---|---|---|---|---|
public_routes | β | β | β | β | Routes that can be accessed without any authentication |
admin_only_routes | β | β | β | β | Routes that can only be accessed by Proxy Admin |
allowed_routes | β | β | β | β | Routes are exposed on the proxy. If not set then all routes exposed. |
LiteLLMRoutes.public_routes
is an ENUM corresponding to the default public routes on LiteLLM. You can see this here
general_settings:
master_key: sk-1234
public_routes: ["LiteLLMRoutes.public_routes", "/spend/calculate"] # routes that can be accessed without any auth
admin_only_routes: ["/key/generate"] # Optional - routes that can only be accessed by Proxy Admin
allowed_routes: ["/chat/completions", "/spend/calculate", "LiteLLMRoutes.public_routes"] # Optional - routes that can be accessed by anyone after Authentication
Step 2 - start proxy
litellm --config config.yaml
Step 3 - Test it
- Test `public_routes`
- Test `admin_only_routes`
- Test `allowed_routes`
curl --request POST \
--url 'http://localhost:4000/spend/calculate' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hey, how'\''s it going?"}]
}'
π Expect this endpoint to work without an Authorization / Bearer Token
Successfull Request
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{}'
Un-successfull Request
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <virtual-key-from-non-admin>' \
--header 'Content-Type: application/json' \
--data '{"user_role": "internal_user"}'
Expected Response
{
"error": {
"message": "user not allowed to access this route. Route=/key/generate is an admin only route",
"type": "auth_error",
"param": "None",
"code": "403"
}
}
Successfull Request
curl http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "fake-openai-endpoint",
"messages": [
{"role": "user", "content": "Hello, Claude"}
]
}'
Un-successfull Request
curl --location 'http://0.0.0.0:4000/embeddings' \
--header 'Content-Type: application/json' \
-H "Authorization: Bearer sk-1234" \
--data ' {
"model": "text-embedding-ada-002",
"input": ["write a litellm poem"]
}'
Expected Response
{
"error": {
"message": "Route /embeddings not allowed",
"type": "auth_error",
"param": "None",
"code": "403"
}
}
Guardrails - Secret Detection/Redactionβ
β Use this to REDACT API Keys, Secrets sent in requests to an LLM.
Example if you want to redact the value of OPENAI_API_KEY
in the following request
Incoming Requestβ
{
"messages": [
{
"role": "user",
"content": "Hey, how's it going, API_KEY = 'sk_1234567890abcdef'",
}
]
}
Request after Moderationβ
{
"messages": [
{
"role": "user",
"content": "Hey, how's it going, API_KEY = '[REDACTED]'",
}
]
}
Usage
Step 1 Add this to your config.yaml
litellm_settings:
callbacks: ["hide_secrets"]
Step 2 Run litellm proxy with --detailed_debug
to see the server logs
litellm --config config.yaml --detailed_debug
Step 3 Test it with request
Send this request
curl --location 'http://localhost:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama3",
"messages": [
{
"role": "user",
"content": "what is the value of my open ai key? openai_api_key=sk-1234998222"
}
]
}'
Expect to see the following warning on your litellm server logs
LiteLLM Proxy:WARNING: secret_detection.py:88 - Detected and redacted secrets in message: ['Secret Keyword']
You can also see the raw request sent from litellm to the API Provider
POST Request Sent from LiteLLM:
curl -X POST \
https://api.groq.com/openai/v1/ \
-H 'Authorization: Bearer gsk_mySVchjY********************************************' \
-d {
"model": "llama3-8b-8192",
"messages": [
{
"role": "user",
"content": "what is the time today, openai_api_key=[REDACTED]"
}
],
"stream": false,
"extra_body": {}
}
Secret Detection On/Off per API Keyβ
β Use this when you need to switch guardrails on/off per API Key
Step 1 Create Key with hide_secrets
Off
π Set "permissions": {"hide_secrets": false}
with either /key/generate
or /key/update
This means the hide_secrets
guardrail is off for all requests from this API Key
- /key/generate
- /key/update
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"permissions": {"hide_secrets": false}
}'
# {"permissions":{"hide_secrets":false},"key":"sk-jNm1Zar7XfNdZXp49Z1kSQ"}
curl --location 'http://0.0.0.0:4000/key/update' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"key": "sk-jNm1Zar7XfNdZXp49Z1kSQ",
"permissions": {"hide_secrets": false}
}'
# {"permissions":{"hide_secrets":false},"key":"sk-jNm1Zar7XfNdZXp49Z1kSQ"}
Step 2 Test it with new key
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-jNm1Zar7XfNdZXp49Z1kSQ' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama3",
"messages": [
{
"role": "user",
"content": "does my openai key look well formatted OpenAI_API_KEY=sk-1234777"
}
]
}'
Expect to see sk-1234777
in your server logs on your callback.
The hide_secrets
guardrail check did not run on this request because api key=sk-jNm1Zar7XfNdZXp49Z1kSQ has "permissions": {"hide_secrets": false}
Content Moderationβ
Content Moderation with LLM Guardβ
Set the LLM Guard API Base in your environment
LLM_GUARD_API_BASE = "http://0.0.0.0:8192" # deployed llm guard api
Add llmguard_moderations
as a callback
litellm_settings:
callbacks: ["llmguard_moderations"]
Now you can easily test it
Make a regular /chat/completion call
Check your proxy logs for any statement with
LLM Guard:
Expected results:
LLM Guard: Received response - {"sanitized_prompt": "hello world", "is_valid": true, "scanners": { "Regex": 0.0 }}
Turn on/off per keyβ
1. Update config
litellm_settings:
callbacks: ["llmguard_moderations"]
llm_guard_mode: "key-specific"
2. Create new key
curl --location 'http://localhost:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"models": ["fake-openai-endpoint"],
"permissions": {
"enable_llm_guard_check": true # π KEY CHANGE
}
}'
# Returns {..'key': 'my-new-key'}
3. Test it!
curl --location 'http://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer my-new-key' \ # π TEST KEY
--data '{"model": "fake-openai-endpoint", "messages": [
{"role": "system", "content": "Be helpful"},
{"role": "user", "content": "What do you know?"}
]
}'
Turn on/off per requestβ
1. Update config
litellm_settings:
callbacks: ["llmguard_moderations"]
llm_guard_mode: "request-specific"
2. Create new key
curl --location 'http://localhost:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"models": ["fake-openai-endpoint"],
}'
# Returns {..'key': 'my-new-key'}
3. Test it!
- OpenAI Python v1.0.0+
- Curl Request
import openai
client = openai.OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={ # pass in any provider-specific param, if not supported by openai, https://docs.litellm.ai/docs/completion/input#provider-specific-params
"metadata": {
"permissions": {
"enable_llm_guard_check": True # π KEY CHANGE
},
}
}
)
print(response)
curl --location 'http://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer my-new-key' \ # π TEST KEY
--data '{"model": "fake-openai-endpoint", "messages": [
{"role": "system", "content": "Be helpful"},
{"role": "user", "content": "What do you know?"}
]
}'
Content Moderation with LlamaGuardβ
Currently works with Sagemaker's LlamaGuard endpoint.
How to enable this in your config.yaml:
litellm_settings:
callbacks: ["llamaguard_moderations"]
llamaguard_model_name: "sagemaker/jumpstart-dft-meta-textgeneration-llama-guard-7b"
Make sure you have the relevant keys in your environment, eg.:
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
Customize LlamaGuard promptβ
To modify the unsafe categories llama guard evaluates against, just create your own version of this category list
Point your proxy to it
callbacks: ["llamaguard_moderations"]
llamaguard_model_name: "sagemaker/jumpstart-dft-meta-textgeneration-llama-guard-7b"
llamaguard_unsafe_content_categories: /path/to/llamaguard_prompt.txt
Content Moderation with Google Text Moderationβ
Requires your GOOGLE_APPLICATION_CREDENTIALS to be set in your .env (same as VertexAI).
How to enable this in your config.yaml:
litellm_settings:
callbacks: ["google_text_moderation"]
Set custom confidence thresholdsβ
Google Moderations checks the test against several categories. Source
Set global default confidence thresholdβ
By default this is set to 0.8. But you can override this in your config.yaml.
litellm_settings:
google_moderation_confidence_threshold: 0.4
Set category-specific confidence thresholdβ
Set a category specific confidence threshold in your config.yaml. If none set, the global default will be used.
litellm_settings:
toxic_confidence_threshold: 0.1
Here are the category specific values:
Category | Setting |
---|---|
"toxic" | toxic_confidence_threshold: 0.1 |
"insult" | insult_confidence_threshold: 0.1 |
"profanity" | profanity_confidence_threshold: 0.1 |
"derogatory" | derogatory_confidence_threshold: 0.1 |
"sexual" | sexual_confidence_threshold: 0.1 |
"death_harm_and_tragedy" | death_harm_and_tragedy_threshold: 0.1 |
"violent" | violent_threshold: 0.1 |
"firearms_and_weapons" | firearms_and_weapons_threshold: 0.1 |
"public_safety" | public_safety_threshold: 0.1 |
"health" | health_threshold: 0.1 |
"religion_and_belief" | religion_and_belief_threshold: 0.1 |
"illicit_drugs" | illicit_drugs_threshold: 0.1 |
"war_and_conflict" | war_and_conflict_threshold: 0.1 |
"politics" | politics_threshold: 0.1 |
"finance" | finance_threshold: 0.1 |
"legal" | legal_threshold: 0.1 |
Swagger Docs - Custom Routes + Brandingβ
Requires a LiteLLM Enterprise key to use. Get a free 2-week license here
Set LiteLLM Key in your environment
LITELLM_LICENSE=""
Customize Title + Descriptionβ
In your environment, set:
DOCS_TITLE="TotalGPT"
DOCS_DESCRIPTION="Sample Company Description"
Customize Routesβ
Hide admin routes from users.
In your environment, set:
DOCS_FILTERED="True" # only shows openai routes to user
Enable Blocked User Listsβ
If any call is made to proxy with this user id, it'll be rejected - use this if you want to let users opt-out of ai features
litellm_settings:
callbacks: ["blocked_user_check"]
blocked_user_list: ["user_id_1", "user_id_2", ...] # can also be a .txt filepath e.g. `/relative/path/blocked_list.txt`
How to testβ
- OpenAI Python v1.0.0+
- Curl Request
Set user=<user_id>
to the user id of the user who might have opted out.
import openai
client = openai.OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
user="user_id_1"
)
print(response)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
"user": "user_id_1" # this is also an openai supported param
}
'
Using via APIβ
Block all calls for a customer id
curl -X POST "http://0.0.0.0:4000/customer/block" \
-H "Authorization: Bearer sk-1234" \
-D '{
"user_ids": [<user_id>, ...]
}'
Unblock calls for a user id
curl -X POST "http://0.0.0.0:4000/user/unblock" \
-H "Authorization: Bearer sk-1234" \
-D '{
"user_ids": [<user_id>, ...]
}'
Enable Banned Keywords Listβ
litellm_settings:
callbacks: ["banned_keywords"]
banned_keywords_list: ["hello"] # can also be a .txt file - e.g.: `/relative/path/keywords.txt`
Test thisβ
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hello world!"
}
]
}
'
Public Model Hubβ
Share a public page of available models for users
[BETA] AWS Key Manager - Key Decryptionβ
This is a beta feature, and subject to changes.
Step 1. Add USE_AWS_KMS
to env
USE_AWS_KMS="True"
Step 2. Add LITELLM_SECRET_AWS_KMS_
to encrypted keys in env
LITELLM_SECRET_AWS_KMS_DATABASE_URL="AQICAH.."
LiteLLM will find this and use the decrypted DATABASE_URL="postgres://.."
value in runtime.
Step 3. Start proxy
$ litellm
How it works?
- Key Decryption runs before server starts up. Code
- It adds the decrypted value to the
os.environ
for the python process.
Note: Setting an environment variable within a Python script using os.environ will not make that variable accessible via SSH sessions or any other new processes that are started independently of the Python script. Environment variables set this way only affect the current process and its child processes.
Set Max Request / Response Size on LiteLLM Proxyβ
Use this if you want to set a maximum request / response size for your proxy server. If a request size is above the size it gets rejected + slack alert triggered
Usageβ
Step 1. Set max_request_size_mb
and max_response_size_mb
For this example we set a very low limit on max_request_size_mb
and expect it to get rejected
In production we recommend setting a max_request_size_mb
/ max_response_size_mb
around 32 MB
model_list:
- model_name: fake-openai-endpoint
litellm_params:
model: openai/fake
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
general_settings:
master_key: sk-1234
# Security controls
max_request_size_mb: 0.000000001 # π Key Change - Max Request Size in MB. Set this very low for testing
max_response_size_mb: 100 # π Key Change - Max Response Size in MB
Step 2. Test it with /chat/completions
request
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "fake-openai-endpoint",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'
Expected Response from request
We expect this to fail since the request size is over max_request_size_mb
{"error":{"message":"Request size is too large. Request size is 0.0001125335693359375 MB. Max size is 1e-09 MB","type":"bad_request_error","param":"content-length","code":400}}