Custom LLM Pricing

Overview

LiteLLM provides flexible cost tracking and pricing customization for all LLM providers:

Custom Pricing - Override default model costs or set pricing for custom models
Cost Per Token - Track costs based on input/output tokens (most common)
Cost Per Second - Track costs based on runtime (e.g., Sagemaker)
Provider Discounts - Apply percentage-based discounts to specific providers
Base Model Mapping - Ensure accurate cost tracking for Azure deployments

By default, the response cost is accessible in the logging object via kwargs["response_cost"] on success (sync + async). Learn More

info

LiteLLM already has pricing for 100+ models in our model cost map.

Cost Per Second (e.g. Sagemaker)

Usage with LiteLLM Proxy Server

Step 1: Add pricing to config.yaml

model_list:
  - model_name: sagemaker-completion-model
    litellm_params:
      model: sagemaker/berri-benchmarking-Llama-2-70b-chat-hf-4
    model_info:
      input_cost_per_second: 0.000420
  - model_name: sagemaker-embedding-model
    litellm_params:
      model: sagemaker/berri-benchmarking-gpt-j-6b-fp16
    model_info:
      input_cost_per_second: 0.000420 

Step 2: Start proxy

litellm /path/to/config.yaml

Step 3: View Spend Logs

Cost Per Token (e.g. Azure)

Usage with LiteLLM Proxy Server

model_list:
  - model_name: azure-model
    litellm_params:
      model: azure/<your_deployment_name>
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      api_version: os.environ/AZURE_API_VERSION
    model_info:
      input_cost_per_token: 0.000421 # 👈 ONLY to track cost per token
      output_cost_per_token: 0.000520 # 👈 ONLY to track cost per token

Provider-Specific Cost Discounts

Apply percentage-based discounts to specific providers (e.g., negotiated enterprise pricing).

Usage with LiteLLM Proxy Server

Step 1: Add discount config to config.yaml

# Apply 5% discount to all Vertex AI and Gemini costs
cost_discount_config:
  vertex_ai: 0.05  # 5% discount
  gemini: 0.05     # 5% discount
  openrouter: 0.05 # 5% discount
  # openai: 0.10   # 10% discount (example)

Step 2: Start proxy

litellm /path/to/config.yaml

The discount will be automatically applied to all cost calculations for the configured providers.

How Discounts Work

Discounts are applied after all other cost calculations (tokens, caching, tools, etc.)
The discount is a percentage (0.05 = 5%, 0.10 = 10%, etc.)
Discounts only apply to the configured providers
Original cost, discount amount, and final cost are tracked in cost breakdown logs
Discount information is returned in response headers:
- x-litellm-response-cost - Final cost after discount
- x-litellm-response-cost-original - Cost before discount
- x-litellm-response-cost-discount-amount - Discount amount in USD

Supported Providers

You can apply discounts to all LiteLLM supported providers. Common examples:

vertex_ai - Google Vertex AI
gemini - Google Gemini
openai - OpenAI
anthropic - Anthropic
azure - Azure OpenAI
bedrock - AWS Bedrock
cohere - Cohere
openrouter - OpenRouter

See the full list of providers in the LlmProviders enum.

Override Model Cost Map

You can override our model cost map with your own custom pricing for a mapped model.

Just add a model_info key to your model in the config, and override the desired keys.

Example: Override Anthropic's model cost map for the prod/claude-3-5-sonnet-20241022 model.

model_list:
  - model_name: "prod/claude-3-5-sonnet-20241022"
    litellm_params:
      model: "anthropic/claude-3-5-sonnet-20241022"
      api_key: os.environ/ANTHROPIC_PROD_API_KEY
    model_info:
      input_cost_per_token: 0.000006
      output_cost_per_token: 0.00003
      cache_creation_input_token_cost: 0.0000075
      cache_read_input_token_cost: 0.0000006

Additional Cost Keys

There are other keys you can use to specify costs for different scenarios and modalities:

input_cost_per_token_above_200k_tokens - Cost for input tokens when context exceeds 200k tokens
output_cost_per_token_above_200k_tokens - Cost for output tokens when context exceeds 200k tokens
cache_creation_input_token_cost_above_200k_tokens - Cache creation cost for large contexts
cache_read_input_token_cost_above_200k_token - Cache read cost for large contexts
input_cost_per_image - Cost per image in multimodal requests
output_cost_per_reasoning_token - Cost for reasoning tokens (e.g., OpenAI o1 models)
input_cost_per_audio_token - Cost for audio input tokens
output_cost_per_audio_token - Cost for audio output tokens
input_cost_per_video_per_second - Cost per second of video input
input_cost_per_video_per_second_above_128k_tokens - Video cost for large contexts
input_cost_per_character - Character-based pricing for some providers

These keys evolve based on how new models handle multimodality. The latest version can be found at https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.

Set 'base_model' for Cost Tracking (e.g. Azure deployments)

Problem: Azure returns gpt-4 in the response when azure/gpt-4-1106-preview is used. This leads to inaccurate cost tracking

Solution ✅ : Set base_model on your config so litellm uses the correct model for calculating azure cost

Get the base model name from here

Example config with base_model

model_list:
  - model_name: azure-gpt-3.5
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      base_model: azure/gpt-4-1106-preview

Debugging

If you're custom pricing is not being used or you're seeing errors, please check the following:

Run the proxy with LITELLM_LOG="DEBUG" or the --detailed_debug cli flag

litellm --config /path/to/config.yaml --detailed_debug

Check logs for this line:

LiteLLM:DEBUG: utils.py:263 - litellm.acompletion

Check if 'input_cost_per_token' and 'output_cost_per_token' are top-level keys in the acompletion function.

acompletion(
  ...,
  input_cost_per_token: my-custom-price, 
  output_cost_per_token: my-custom-price,
)

If these keys are not present, LiteLLM will not use your custom pricing.

If the problem persists, please file an issue on GitHub.

Overview​

Cost Per Second (e.g. Sagemaker)​

Usage with LiteLLM Proxy Server​

Cost Per Token (e.g. Azure)​

Usage with LiteLLM Proxy Server​

Provider-Specific Cost Discounts​

Usage with LiteLLM Proxy Server​

How Discounts Work​

Supported Providers​

Override Model Cost Map​

Additional Cost Keys​

Set 'base_model' for Cost Tracking (e.g. Azure deployments)​

Debugging​

Overview

Cost Per Second (e.g. Sagemaker)

Usage with LiteLLM Proxy Server

Cost Per Token (e.g. Azure)

Usage with LiteLLM Proxy Server

Provider-Specific Cost Discounts

Usage with LiteLLM Proxy Server

How Discounts Work

Supported Providers

Override Model Cost Map

Additional Cost Keys

Set 'base_model' for Cost Tracking (e.g. Azure deployments)

Debugging