v1.92.0rc1 - Claude Sonnet 5, Production MCP OAuth & New Providers
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.92.0-rc.1
pip install litellm==1.92.0rc1
Key Highlights​
- Claude Sonnet 5 - first-class support across Anthropic, Amazon Bedrock (including the regional inference profiles), Vertex AI, and Azure AI, with a 1M-token context window, reasoning, computer use, PDF input, and introductory pricing through 2026-08-31.
- Production-ready MCP OAuth (On-Behalf-Of) - the token_exchange arm moves onto the v2 resolver with RFC 9728 -> RFC 8414 endpoint discovery (no IdP guessing), persisted Dynamic Client Registration, per-server outbound concurrency limits, and a
mcp_tool_searchvirtual tool for large tool catalogs. - Two new providers - Tencent (DeepSeek V4 flash/pro via TokenHub, chat and
/v1/messages) and Google Distributed Cloud (GDC) Gemini for on-prem/sovereign deployments. - Access-control hardening - admin-gating of
permissionsandallowed_routesacross the key, user, and team endpoints, AES-256-GCM at-rest credential encryption with a versioned re-encryption migration, and redaction of secrets from startup and router error logs. - Faster hot paths - spend-counter increments and pre-call budget reads are now gathered concurrently, the cost-callback deepcopy moved off the request event loop, OTel runtime imports are memoized, and Redis-cluster reconnect plus read-replica boot resilience keep the proxy serving during infra blips.
New Providers and Endpoints​
New Providers (2 new providers)​
| Provider | Supported LiteLLM Endpoints | Description |
|---|---|---|
Tencent (tencent) | Chat Completions, /v1/messages | Tencent TokenHub provider serving DeepSeek V4 (flash and pro) with reasoning, prompt caching, and Anthropic Messages support - PR #31903 |
Google Distributed Cloud - GDC (gdc) | Chat Completions | Google Distributed Cloud Gemini provider for on-prem and sovereign-cloud deployments - PR #31895 |
New Models / Updated Models​
New Model Support (13 new pricing entries across 4 models)​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| Anthropic / Bedrock / Vertex / Azure AI | claude-sonnet-5 | 1M | $2.00 | $10.00 | Reasoning, vision, function calling, prompt caching, computer use, PDF input, adaptive thinking |
| Bedrock Mantle | bedrock_mantle/xai.grok-4.3 | 131K | $1.25 | $2.50 | Reasoning, vision, function calling |
| Tencent | tencent/deepseek-v4-flash | 1M | $0.14 | $0.28 | Reasoning, function calling, prompt caching |
| Tencent | tencent/deepseek-v4-pro | 1M | $0.435 | $0.87 | Reasoning, function calling, prompt caching |
Claude Sonnet 5 ships with pricing entries for Anthropic (claude-sonnet-5), Amazon Bedrock (anthropic.claude-sonnet-5 plus the us / eu / au / jp / global regional inference profiles), Vertex AI (vertex_ai/claude-sonnet-5), and Azure AI (azure_ai/claude-sonnet-5) - PR #31740, with introductory pricing applied through 2026-08-31 - PR #31917.
Features​
- Anthropic
- Amazon Bedrock
- Forward
strictandadditionalPropertiesto the ConversetoolSpec- PR #29814 - Add
xai.grok-4.3to the model cost map for Bedrock Mantle SigV4 auth - PR #31916 - SigV4/IAM auth on the Bedrock Mantle Responses API route - PR #29788
- Honor
ttlfortool_configcache injection points - PR #31929 - Map
guardrailConfigto InvokeModel guardrail headers - PR #31985
- Forward
- Google Vertex AI
- Google Gemini
- Tencent
- Add Tencent TokenHub as a provider serving DeepSeek V4 - PR #31903
- Fireworks AI
- Enable tool calling for
glm-5p1in the model cost map - PR #29697
- Enable tool calling for
- Databricks
- Split parallel tool calls so each tool message follows its
tool_calls- PR #31633
- Split parallel tool calls so each tool message follows its
- General
- Add Parasail as a JSON-configured OpenAI-compatible provider - PR #29842
Bug Fixes​
LLM API Endpoints​
Features​
- Responses API
- /v1/messages
- Drop top-level
additional_drop_paramson/v1/messages- PR #31645
- Drop top-level
- Image Generation
- Azure AI MAI-Image-2.5 image generation support - PR #29688
- A2A
- Support
a2a-sdk1.x proxy routing for 0.3 and 1.0 agents - PR #30950
- Support
- Sandbox
- Reuse the e2b container across requests when
metadata.session_idis set - PR #31688
- Reuse the e2b container across requests when
- General
Bugs​
- Responses API
- Preserve forced-function
tool_choicename in the Responses-to-Chat transform - PR #29812 - Map a system-only chat request to a system input item in the Responses bridge - PR #29817
- Merge
metadata.tagsintolitellm_metadataon the/v1/responsesroute - PR #31793 - Route per-model on GitHub Copilot
/v1/responsesbased on model info - PR #29747 - Drop unmappable Bedrock Responses tools instead of failing the request - PR #31663
- Preserve forced-function
- Realtime API
- OCR
- Preserve content, tables, and
keyValuePairsin Azure AI doc-intelligence/v1/ocr- PR #32018
- Preserve content, tables, and
- A2A
- Populate response usage in the A2A chat transformation - PR #31980
- General
Management Endpoints / UI​
Features​
- Virtual Keys & Access Control
- Admin-gate
permissionson/key/update,/key/regenerate,/user/new,/user/update, and bulk key updates - PR #31810, PR #31998, PR #32002 - Admin-gate
allowed_routespresence on/key/updateand/key/regenerate- PR #31987 - Gate non-admin
/key/generatebudget_limitsand permissions - PR #31469 - Reject team-scoped
object_permissionon personal keys for non-admins - PR #31471 - Support
object_permissionindefault_key_generate_params- PR #31776 - Reject non-finite
budget_limitswindows and hard-reject CLI session-token personal-key budgets on/key/generate- PR #31630, PR #31631 - Tighten role gating on the
/get/config/callbacksresponse and extend banned-params + admin-clear lists - PR #31745, PR #31742 - Audit default-user-settings and remaining system-wide-settings updates - PR #31753, PR #31754
- JWT auth opt-in fallback to the DB team on an unresolved claim - PR #28913
- Reject non-existent team/key/model scope entries on policy attachment create - PR #32131
- Admin-gate
- UI
- shadcn migration foundation: Tailwind v4, shadcn init, and antd cascade fix - PR #31995
- Migrate the chat UI from antd to shadcn/ui and add key-management and usage panels - PR #32074
- Rotate model credentials in a dedicated modal so a normal save can't overwrite secrets - PR #28089
- Disclaim that the Update API Key modal only rotates
api_key- PR #31805 - Add budget duration to the edit-team-member form - PR #29717
- Render provider icons on the public model hub - PR #29958
Bugs​
- Virtual Keys & Models
- Show team projects to internal users on key creation - PR #28855
- Label the default key type as "Full Access" on the key edit page - PR #29870
- Keep virtual-keys filters across delete and refresh - PR #31533
- Allow deleting a BYOK model after its team is deleted, and delete a team's BYOK models on team deletion - PR #29875, PR #29977
- Count only legacy
function_call.argumentsin the token counter - PR #31741 - Fix the typo
generic_role_mappoings->generic_role_mappings- PR #29753
- UI
- Stop the Request Logs page from overflowing horizontally and size its columns - PR #31426
- Fix the Router Settings Loadbalancing tab save - PR #31735
- Allow any git host on the skills add form - PR #31652
- Include cache token columns in the usage export - PR #32015
- Unify migrated-route URLs and migrate the API Reference page - PR #29953
- Make the workflow runs page fill full width - PR #29868
AI Integrations​
Logging​
- Prometheus
- Add an
api_providerlabel to token, latency, request, and cache metrics - PR #32126 - Add
litellm_overhead_with_guardrails_latency_metric- PR #31593 - Expose
project_aliasin custom metadata labels and expose MCP tool metadata - PR #31784, PR #31899 - Bound per-request budget metric emission with a timeout - PR #31632
- Add an
- S3
- Send
Content-MD5on PUT and support optional server-side encryption in the s3 v2 logger - PR #31928
- Send
- Microsoft Sentinel
- Resolve the audit stream from
AZURE_SENTINEL_AUDIT_STREAM_NAME- PR #32010
- Resolve the audit stream from
- FOCUS Export
- General
- Route realtime success logging through the bounded worker - PR #31733
- Restore the admin key/team
callback_vars.turn_off_message_loggingoverride - PR #31905 - Resolve
model_map_valuefor proxy custom pricing in standard logging - PR #31940 - Log hashed cache keys - PR #29890
- Add a Galileo health check for the UI callback test - PR #29908
Guardrails​
- Model Armor
- Scan file and document attachments with Model Armor - PR #31655
- CrowdStrike AIDR
- Headroom
- General
Secret Managers​
- General
- AES-256-GCM at-rest credential encryption with a versioned format and a re-encryption migration - PR #31215
Spend Tracking, Budgets and Rate Limiting​
- Budgets & Fallbacks
- Key-level
budget_fallbacksto reroute requests when a per-model budget is exceeded - PR #31783, with UI configuration on the key create/edit forms - PR #32072 - Add a
disable_budget_reservationgeneral setting - PR #29493 - Reserve team-budget raises for proxy admins and don't block
/team/updateon an unchanged budget - PR #30030, PR #29525 - Prevent duplicate budget alert emails on concurrent threshold crossings and apply
EMAIL_SIGNATUREto them - PR #32011, PR #31712
- Key-level
- Cost Tracking
- Standardize rate-limit errors with
category,rate_limit_type,model, andllm_providerfields - PR #27687 - Store the cost breakdown for
/v1/realtimesessions - PR #30069 - Track cost for unmanaged Vertex AI batch jobs - PR #31442
- Report real token usage on blocked responses - PR #31217
- Emit the
x-litellm-response-costheader on/messagesand/generateContent- PR #31675 - Recognize
*.cognitiveservices.azure.comas OpenAI-compatible in pass-through cost tracking - PR #29730 - Record agent
cost_per_queryand input tokens on the A2A native send path - PR #31979 - Count only active users toward the license seat limit - PR #31227
- Log per-token-type reasoning and cache cost breakdown - PR #31623
- Standardize rate-limit errors with
MCP Gateway​
- OAuth 2.0 (On-Behalf-Of) v2
- Migrate the token_exchange (OBO) arm to the v2 resolver and make it production-ready with discovery threading, audit hardening, and an RFC 9728 challenge - PR #31526, PR #31622
- Discover the OBO token endpoint via RFC 9728 -> RFC 8414 instead of guessing the IdP - PR #31762
- Persist the DCR
client_idso interactive OAuth token refresh works, including on-create Authorize & Fetch - PR #31912, PR #31920 - Resolve per-user OAuth identity authoritatively at the token endpoint - PR #31657
- Support
client_secret_basicfor upstream OAuth token endpoints and add a token-endpoint auth-method selector in the UI - PR #31635, PR #31739 - Gate OAuth authorize/token/register/discovery on
auth_type=oauth2- PR #31736 - Mirror the upstream token lifetime instead of forcing a 1h OBO expiry - PR #29951
- Reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session - PR #30000
- Let non-creator users OAuth into OBO-mode servers and allow team access-group grants in the authorize/token check - PR #29867, PR #30041
- Server Management & Tools
- Add
mcp_tool_searchvirtual tools for large tool catalogs - PR #31777 - Bound outbound tool-call concurrency per MCP server - PR #31641
- Add an
all-proxy-mcpserverssentinel to grant teams every MCP server - PR #32012 - Roll up MCP tool spend to user counters and the usage UI - PR #31576
- Hydrate the MCP server registry from the DB on startup when
store_model_in_dbis false - PR #31775 - Emit a
tools/listCLIENT span for MCP discovery under otel_v2 - PR #31525
- Add
- Bug Fixes
- Stop one unauthenticated server from emptying the aggregate
tools/list- PR #31684 - Surface
tools/listauth failures as a 401 challenge on single-server routes - PR #31921 - Load MCP tool configuration tools via the OBO/passthrough-aware GET path - PR #29960
- Tighten role-based visibility on
/v1/mcp/server/submissions- PR #31932 - Highlight MCP cards red when the logged-in user is missing per-user env vars - PR #29856
- BYOM visibility, preview UX, and admin-settings gating - PR #31809
- Re-add the chat UI and allow a simple UI for MCP OBO auth - PR #31893
- Stop one unauthenticated server from emptying the aggregate
Performance / Loadbalancing / Reliability improvements​
- Spend & Auth hot paths
- Gather independent per-scope spend-counter increments - PR #31578
- Move the cost-callback payload deepcopy off the request event loop - PR #31579
- Gather independent pre-call budget-enforcement reads in
common_checks- PR #31604 - Memoize the per-request lazy import of OTel runtime hooks - PR #31707
- Load the virtual-keys team filter from the fast v2 endpoint - PR #31638
- Reliability
- Keep serving reads from the read replica when the primary DB is down at startup - PR #31951
- Re-establish async Redis-cluster connections after a node restart - PR #31577
- Isolate poison spend-log rows so one bad record can't drop the whole batch - PR #31705
- Stop leaking
master_keyanddatabase_urlin startup DEBUG logs - PR #31944
- Routing
Documentation Updates​
- Require a reproduction video for reported issues in the contribution guidelines - PR #30063
PR roll-up by ownership area​
PRs by ownership area (total: 218)
- LLM API Endpoints: 30
- UI: 28
- MCP: 27
- Models & Providers: 25
- Auth & Management: 23
- Other (CI / chore / tests / version bumps): 21
- Logging: 20
- Spend / Budgets / Rate Limits: 18
- Guardrails: 12
- Performance: 11
- Docs: 2
- Secret Managers: 1
New Contributors​
- @roytev made their first contribution in PR #29565
- @balcsida made their first contribution in PR #29581
- @PigeonMark made their first contribution in PR #29584
- @johngarrido made their first contribution in PR #29623
- @arnav-144p made their first contribution in PR #29753
- @Kaihuang724 made their first contribution in PR #29842
- @fengjikui made their first contribution in PR #29890
- @fernando-izar made their first contribution in PR #31632
Full Changelog​
https://github.com/BerriAI/litellm/compare/v1.91.0-rc.1...v1.92.0-rc.1