Skip to main content

Custom LLM Pricing

Overviewโ€‹

LiteLLM provides flexible cost tracking and pricing customization for all LLM providers:

  • Custom Pricing - Override default model costs or set pricing for custom models
  • Cost Per Token - Track costs based on input/output tokens (most common)
  • Cost Per Second - Track costs based on runtime (e.g., Sagemaker)
  • Provider Discounts - Apply percentage-based discounts to specific providers
  • Base Model Mapping - Ensure accurate cost tracking for Azure deployments

By default, the response cost is accessible in the logging object via kwargs["response_cost"] on success (sync + async). Learn More

info

LiteLLM already has pricing for 100+ models in our model cost map.

Cost Per Second (e.g. Sagemaker)โ€‹

Usage with LiteLLM Proxy Serverโ€‹

Step 1: Add pricing to config.yaml

model_list:
- model_name: sagemaker-completion-model
litellm_params:
model: sagemaker/berri-benchmarking-Llama-2-70b-chat-hf-4
model_info:
input_cost_per_second: 0.000420
- model_name: sagemaker-embedding-model
litellm_params:
model: sagemaker/berri-benchmarking-gpt-j-6b-fp16
model_info:
input_cost_per_second: 0.000420

Step 2: Start proxy

litellm /path/to/config.yaml

Step 3: View Spend Logs

Cost Per Token (e.g. Azure)โ€‹

Usage with LiteLLM Proxy Serverโ€‹

model_list:
- model_name: azure-model
litellm_params:
model: azure/<your_deployment_name>
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
api_version: os.environ/AZURE_API_VERSION
model_info:
input_cost_per_token: 0.000421 # ๐Ÿ‘ˆ ONLY to track cost per token
output_cost_per_token: 0.000520 # ๐Ÿ‘ˆ ONLY to track cost per token

Provider-Specific Cost Discountsโ€‹

Apply percentage-based discounts to specific providers (e.g., negotiated enterprise pricing).

Usage with LiteLLM Proxy Serverโ€‹

Step 1: Add discount config to config.yaml

# Apply 5% discount to all Vertex AI and Gemini costs
cost_discount_config:
vertex_ai: 0.05 # 5% discount
gemini: 0.05 # 5% discount
openrouter: 0.05 # 5% discount
# openai: 0.10 # 10% discount (example)

Step 2: Start proxy

litellm /path/to/config.yaml

The discount will be automatically applied to all cost calculations for the configured providers.

How Discounts Workโ€‹

  • Discounts are applied after all other cost calculations (tokens, caching, tools, etc.)
  • The discount is a percentage (0.05 = 5%, 0.10 = 10%, etc.)
  • Discounts only apply to the configured providers
  • Original cost, discount amount, and final cost are tracked in cost breakdown logs
  • Discount information is returned in response headers:
    • x-litellm-response-cost - Final cost after discount
    • x-litellm-response-cost-original - Cost before discount
    • x-litellm-response-cost-discount-amount - Discount amount in USD

Supported Providersโ€‹

You can apply discounts to all LiteLLM supported providers. Common examples:

  • vertex_ai - Google Vertex AI
  • gemini - Google Gemini
  • openai - OpenAI
  • anthropic - Anthropic
  • azure - Azure OpenAI
  • bedrock - AWS Bedrock
  • cohere - Cohere
  • openrouter - OpenRouter

See the full list of providers in the LlmProviders enum.

Override Model Cost Mapโ€‹

You can override our model cost map with your own custom pricing for a mapped model.

Just add a model_info key to your model in the config, and override the desired keys.

Example: Override Anthropic's model cost map for the prod/claude-3-5-sonnet-20241022 model.

model_list:
- model_name: "prod/claude-3-5-sonnet-20241022"
litellm_params:
model: "anthropic/claude-3-5-sonnet-20241022"
api_key: os.environ/ANTHROPIC_PROD_API_KEY
model_info:
input_cost_per_token: 0.000006
output_cost_per_token: 0.00003
cache_creation_input_token_cost: 0.0000075
cache_read_input_token_cost: 0.0000006

Additional Cost Keysโ€‹

There are other keys you can use to specify costs for different scenarios and modalities:

  • input_cost_per_token_above_200k_tokens - Cost for input tokens when context exceeds 200k tokens
  • output_cost_per_token_above_200k_tokens - Cost for output tokens when context exceeds 200k tokens
  • cache_creation_input_token_cost_above_200k_tokens - Cache creation cost for large contexts
  • cache_read_input_token_cost_above_200k_token - Cache read cost for large contexts
  • input_cost_per_image - Cost per image in multimodal requests
  • output_cost_per_reasoning_token - Cost for reasoning tokens (e.g., OpenAI o1 models)
  • input_cost_per_audio_token - Cost for audio input tokens
  • output_cost_per_audio_token - Cost for audio output tokens
  • input_cost_per_video_per_second - Cost per second of video input
  • input_cost_per_video_per_second_above_128k_tokens - Video cost for large contexts
  • input_cost_per_character - Character-based pricing for some providers

These keys evolve based on how new models handle multimodality. The latest version can be found at https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.

Set 'base_model' for Cost Tracking (e.g. Azure deployments)โ€‹

Problem: Azure returns gpt-4 in the response when azure/gpt-4-1106-preview is used. This leads to inaccurate cost tracking

Solution โœ… : Set base_model on your config so litellm uses the correct model for calculating azure cost

Get the base model name from here

Example config with base_model

model_list:
- model_name: azure-gpt-3.5
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview

Debuggingโ€‹

If you're custom pricing is not being used or you're seeing errors, please check the following:

  1. Run the proxy with LITELLM_LOG="DEBUG" or the --detailed_debug cli flag
litellm --config /path/to/config.yaml --detailed_debug
  1. Check logs for this line:
LiteLLM:DEBUG: utils.py:263 - litellm.acompletion
  1. Check if 'input_cost_per_token' and 'output_cost_per_token' are top-level keys in the acompletion function.
acompletion(
...,
input_cost_per_token: my-custom-price,
output_cost_per_token: my-custom-price,
)

If these keys are not present, LiteLLM will not use your custom pricing.

If the problem persists, please file an issue on GitHub.