This is a pre-release version.
The production version will be released on Wednesday.
Deploy this versionβ
- Docker
- Pip
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.72.6.rc
This version is not out yet.
TLDRβ
- Why Upgrade
- Codex-mini on Claude Code: You can now use
codex-mini
(OpenAIβs code assistant model) via Claude Code. - MCP Permissions Management: Manage permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM.
- UI: Turn on/off auto refresh on logs view.
- Rate Limiting: Support for output token-only rate limiting.
- Codex-mini on Claude Code: You can now use
- Who Should Read
- Teams using
/v1/messages
API (Claude Code) - Teams using MCP
- Teams giving access to self-hosted models and setting rate limits
- Teams using
- Risk of Upgrade
- Low
- No major changes to existing functionality or package updates.
- Low
Key Highlightsβ
MCP Permissions Managementβ
This release brings support for managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.
This is great for use cases that require access to restricted data (e.g Jira MCP) that you don't want everyone to use.
For Proxy Admins, this enables centralized management of all MCP Servers with access control. For developers, this means you'll only see the MCP tools assigned to you.
Codex-mini on Claude Codeβ
This release brings support for calling codex-mini
(OpenAIβs code assistant model) via Claude Code.
This is done by LiteLLM enabling any Responses API model (including o3-pro
) to be called via /chat/completions
and /v1/messages
endpoints. This includes:
- Streaming calls
- Non-streaming calls
- Cost Tracking on success + failure for Responses API models
Here's how to use it today
New / Updated Modelsβ
Pricing / Context Window Updatesβ
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Type |
---|---|---|---|---|---|
VertexAI | vertex_ai/claude-opus-4 | 200K | $15.00 | $75.00 | New |
OpenAI | gpt-4o-audio-preview-2025-06-03 | 128k | $2.5 (text), $40 (audio) | $10 (text), $80 (audio) | New |
OpenAI | o3-pro | 200k | 20 | 80 | New |
OpenAI | o3-pro-2025-06-10 | 200k | 20 | 80 | New |
OpenAI | o3 | 200k | 2 | 8 | Updated |
OpenAI | o3-2025-04-16 | 200k | 2 | 8 | Updated |
Azure | azure/gpt-4o-mini-transcribe | 16k | 1.25 (text), 3 (audio) | 5 (text) | New |
Mistral | mistral/magistral-medium-latest | 40k | 2 | 5 | New |
Mistral | mistral/magistral-small-latest | 40k | 0.5 | 1.5 | New |
- Deepgram:
nova-3
cost per second pricing is now supported.
Updated Modelsβ
Bugsβ
- Watsonx
- Ignore space id on Watsonx deployments (throws json errors) - PR
- Ollama
- Set tool call id for streaming calls - PR
- Gemini (VertexAI + Google AI Studio)
- Custom LLM
- Huggingface
- Add /chat/completions to endpoint url when missing - PR
- Deepgram
- Support async httpx calls - PR
- Anthropic
- Append prefix (if set) to assistant content start - PR
Featuresβ
- VertexAI
- Anthropic
- βnoneβ tool choice param support - PR, Get Started
- Perplexity
- Add βreasoning_effortβ support - PR, Get Started
- Mistral
- Add mistral reasoning support - PR, Get Started
- SGLang
- Map context window exceeded error for proper handling - PR
- Deepgram
- Provider specific params support - PR
- Azure
- Return content safety filter results - PR
LLM API Endpointsβ
Bugsβ
- Chat Completion
- Streaming - Ensure consistent βcreatedβ across chunks - PR
Featuresβ
Spend Trackingβ
Bugsβ
- End Users
- Custom Pricing
- Convert scientific notation str to int - PR
Management Endpoints / UIβ
Bugsβ
Featuresβ
- Leftnav
- Show remaining Enterprise users on UI
- MCP
- Models
- Add deepgram models on UI
- Model Access Group support on UI - PR
- Keys
- Trim long user idβs - PR
- Logs
Logging / Guardrails Integrationsβ
Bugsβ
- Arize
- Prometheus
- Fix total requests increment - PR
Featuresβ
- Lasso Guardrails
- [NEW] Lasso Guardrails support - PR
- Users
- New
organizations
param on/user/new
- allows adding users to orgs on creation - PR
- New
- Prevent double logging when using bridge logic - PR
Performance / Reliability Improvementsβ
Bugsβ
- Tag based routing
- Do not consider βdefaultβ models when request specifies a tag - PR (s/o thiagosalvatore)
Featuresβ
General Proxy Improvementsβ
Bugsβ
- aiohttp
- fixes for transfer encoding error on aiohttp transport - PR
Featuresβ
- aiohttp
- CLI
- Make all commands show server URL - PR
- Unicorn
- Allow setting keep alive timeout - PR
- Experimental Rate Limiting v2 (enable via
EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True"
) - Helm
- support extraContainers in migrations-job.yaml - PR
New Contributorsβ
- @laurien16 made their first contribution in https://github.com/BerriAI/litellm/pull/8460
- @fengbohello made their first contribution in https://github.com/BerriAI/litellm/pull/11547
- @lapinek made their first contribution in https://github.com/BerriAI/litellm/pull/11570
- @yanwork made their first contribution in https://github.com/BerriAI/litellm/pull/11586
- @dhs-shine made their first contribution in https://github.com/BerriAI/litellm/pull/11575
- @ElefHead made their first contribution in https://github.com/BerriAI/litellm/pull/11450
- @idootop made their first contribution in https://github.com/BerriAI/litellm/pull/11616
- @stevenaldinger made their first contribution in https://github.com/BerriAI/litellm/pull/11649
- @thiagosalvatore made their first contribution in https://github.com/BerriAI/litellm/pull/11454
- @vanities made their first contribution in https://github.com/BerriAI/litellm/pull/11595
- @alvarosevilla95 made their first contribution in https://github.com/BerriAI/litellm/pull/11661
Demo Instanceβ
Here's a Demo Instance to test changes:
- Instance: https://demo.litellm.ai/
- Login Credentials:
- Username: admin
- Password: sk-1234