Skip to main content

Quick Start

Quick start CLI, Config, Docker

LiteLLM Server (LLM Gateway) manages:

$ pip install 'litellm[proxy]'

Quick Start - LiteLLM Proxy CLI​

Run the following command to start the litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:4000
info

Run with --detailed_debug if you need detailed debug logs

$ litellm --model huggingface/bigcode/starcoder --detailed_debug

Test​

In a new shell, run, this will make an openai.chat.completions request. Ensure you're using openai v1.0.0+

litellm --test

This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.

Supported LLMs​

All LiteLLM supported LLMs are supported on the Proxy. Seel all supported llms

$ export AWS_ACCESS_KEY_ID=
$ export AWS_REGION_NAME=
$ export AWS_SECRET_ACCESS_KEY=
$ litellm --model bedrock/anthropic.claude-v2

Quick Start - LiteLLM Proxy + Config.yaml​

The config allows you to create a model list and set api_base, max_tokens (all litellm params). See more details about the config here

Create a Config for LiteLLM Proxy​

Example config

model_list: 
- model_name: gpt-3.5-turbo # user-facing model alias
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: azure/<your-deployment-name>
api_base: <your-azure-api-endpoint>
api_key: <your-azure-api-key>
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
- model_name: vllm-model
litellm_params:
model: openai/<your-model-name>
api_base: <your-vllm-api-base> # e.g. http://0.0.0.0:3000/v1
api_key: <your-vllm-api-key|none>

Run proxy with config​

litellm --config your_config.yaml

Using LiteLLM Proxy - Curl Request, OpenAI Package, Langchain​

info

LiteLLM is compatible with several SDKs - including OpenAI SDK, Anthropic SDK, Mistral SDK, LLamaIndex, Langchain (Js, Python)

More examples here

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'

More Info

📖 Proxy Endpoints - Swagger Docs​

  • POST /chat/completions - chat completions endpoint to call 100+ LLMs
  • POST /completions - completions endpoint
  • POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
  • GET /models - available models on server
  • POST /key/generate - generate a key to access the proxy

Debugging Proxy​

Events that occur during normal operation

litellm --model gpt-3.5-turbo --debug

Detailed information

litellm --model gpt-3.5-turbo --detailed_debug

Set Debug Level using env variables​

Events that occur during normal operation

export LITELLM_LOG=INFO

Detailed information

export LITELLM_LOG=DEBUG

No Logs

export LITELLM_LOG=None