v1.90.0 - Six New Providers, OpenTelemetry v2 Parity & Streaming Reliability
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.90.0
pip install litellm==1.90.0
Key Highlights​
- Six new providers - ModelScope, LibertAI, Parasail, Pinstripes, TinyFish (search), and FastCRW (search) - plus a new e2b code-execution sandbox primitive.
- 91 new models across Fireworks AI, Scaleway, Tensormesh, LibertAI, Azure AI (including
gpt-5.5and DeepSeek V4), and Bedrock Mantle. - OpenTelemetry v2 reaches metrics parity with v1, emitting the six
gen_ai.client.*metrics, stamping input/output message content, and scoping OTLP credentials per tenant. - A broad streaming-reliability sweep: upstream connections are now released when the client disconnects mid-stream (Gemini, aiohttp), requests are cancelled cleanly, and partial spend is recorded on interrupted streams.
- Two new guardrails (Cisco AI Defense, Repello Argus) and a large Next.js App Router UI migration covering the models, teams, users, organizations, api-keys, and usage pages.
App Router Routing​
We're moving the Admin UI from query param based routing to the Nextjs App Router. The motivation is that routing now lives in the URL, so any view (a specific team, a filtered usage report, a single key) becomes a shareable link you can send to a teammate or bookmark instead of a piece of in-memory client state.
The motivation for this is twofold: it lays the groundwork for a lot of highly requested feaetures/improvements to the UI, and it lays the groundwork for contributing to LiteLLM easier and code more human reviewable for maintainers.
The biggest one here will be sharing links for different pages like specific logs pages, teams pages, and more.
New Providers and Endpoints​
New Providers (6 new providers)​
| Provider | Supported LiteLLM Endpoints | Description |
|---|---|---|
ModelScope (modelscope) | Chat Completions | OpenAI-compatible provider for ModelScope-hosted models - PR #28460 |
LibertAI (libertai) | Chat Completions, Embeddings | JSON-configured OpenAI-compatible provider; ships 12 catalog models including bge-m3 embeddings - PR #30203 |
TinyFish (tinyfish) | Search | Web search provider - PR #30634 |
FastCRW (fastcrw) | Search | Web search provider - PR #30434 |
Parasail (parasail) | Chat Completions | OpenAI-compatible provider |
Pinstripes (pinstripes) | Chat Completions | New chat provider; ships 6 catalog models |
New LLM API Endpoints​
| Capability | Description | Documentation |
|---|---|---|
| Code execution (e2b) | New sandbox / code-interpreter primitive for running model-generated code - PR #30898 | Sandbox |
New Models / Updated Models​
New Model Support (91 new models)​
| Provider | Model | Context | Input ($/1M) | Output ($/1M) | Features |
|---|---|---|---|---|---|
| Azure AI | azure_ai/gpt-5.5 | 1,050,000 | $5 | $30 | reasoning, function calling, prompt caching, pdf, vision |
| Azure AI | azure_ai/gpt-5.5-2026-04-23 | 1,050,000 | $5 | $30 | reasoning, function calling, prompt caching, pdf, vision |
| Azure AI | azure_ai/deepseek-v4-flash | 1,000,000 | $0.19 | $0.51 | reasoning, function calling |
| Azure AI | azure_ai/deepseek-v4-pro | 1,000,000 | $1.74 | $3.48 | reasoning, function calling |
| Azure AI | azure_ai/deepseek-v3.1 | 131,072 | $1.23 | $4.94 | reasoning, function calling |
| Azure AI | azure_ai/MAI-Image-2.5 | - | $5 | - | image generation |
| Azure AI | azure_ai/MAI-Image-2.5-Flash | - | $1.75 | - | image generation |
| Azure AI | azure_ai/MAI-Image-2e | - | $5 | - | image generation |
| Azure | azure/gpt-realtime-whisper | - | - | - | audio transcription |
| OpenAI | gpt-realtime-whisper | - | - | - | audio transcription |
| DeepSeek | deepseek-v4-flash / deepseek/deepseek-v4-flash | 1,000,000 | $0.14 | $0.28 | function calling, prompt caching |
| DeepSeek | deepseek-v4-pro / deepseek/deepseek-v4-pro | 1,000,000 | $0.43 | $0.87 | function calling, prompt caching |
| Mistral | mistral/mistral-medium-3-5 | 262,144 | $1.50 | $7.50 | function calling, vision |
| GitHub Copilot | github_copilot/mai-code-1-flash | 128,000 | $0.75 | $4.50 | function calling |
| Fireworks AI | 24 models incl. deepseek-v4-pro, glm-5p2, kimi-k2p6/kimi-k2p7-code, minimax-m3, qwen3p7-plus, gpt-oss-120b/gpt-oss-20b | up to 1,048,576 | $0.07-$2.80 | $0.28-$8.80 | function calling, reasoning, vision |
| Bedrock Mantle | bedrock_mantle/google.gemma-4-26b-a4b / gemma-4-31b / gemma-4-e2b | 128k-256k | $0.04-$0.14 | $0.08-$0.40 | function calling, reasoning, vision |
| LibertAI | 12 models incl. qwen3.6-35b-a3b(-thinking), gemma-4-31b-it(-thinking), deepseek-v4-flash, bge-m3 | up to 262,144 | $0.01-$0.25 | free-$1.75 | function calling, reasoning, vision, embedding |
| Pinstripes | 6 models incl. ps/minimax-m2.7, ps/qwen3.6-35b-a3b, ps/glm-4.5-air, ps/deepseek-v4-flash | up to 1,000,192 | $0.09-$0.30 | $0.20-$0.60 | function calling, reasoning |
| Scaleway | 17 models incl. qwen3.5-397b-a17b, mistral-medium-3.5-128b, gemma-4-26b-a4b-it, gpt-oss-120b, whisper-large-v3 | up to 256,000 | free-$1.50 | free-$7.50 | function calling, reasoning, vision, audio, embedding |
| Tensormesh | 10 models incl. Qwen3-Coder-480B-A35B-FP8, Qwen3.5-397B-A17B-FP8, Kimi-K2.6, DeepSeek-V4-Flash, gpt-oss-120b/gpt-oss-20b | up to 262,144 | $0.07-$1.40 | $0.28-$4.40 | function calling, reasoning, prompt caching |
| Soniox | soniox/stt-async-v5 | 8,000 | - | - | audio transcription |
| TinyFish | tinyfish/search | - | - | - | search |
The 91 new entries also include the full fireworks_ai/accounts/... model and router paths. Claude Fable 5 already shipped in v1.89.0, so it is not counted here. Full diff: model_prices_and_context_window.json.
Features​
- Anthropic
- OpenRouter
- Map reasoning
maxlevel toxhigh- PR #28881
- Map reasoning
- Bedrock
- DashScope
- Add Responses API support - PR #30286
- OCI
- Make Cohere
{{trace}}judges work (tool param types + agentic tool-calling continuation) - PR #30646
- Make Cohere
Bug Fixes​
- Anthropic
- Apply
cache_control_injection_pointson the/v1/messagespath - PR #30341 - Strip LiteLLM-injected
total_tokensfrom/v1/messagesresponses - PR #30382 - Cap cache_control injection at 4 blocks - PR #30480
- Drop orphaned
server_tool_useon multi-turn replay from generic OpenAI clients - PR #30486 - Don't leak tool
typeinto OpenAI function parameters schema - PR #30618
- Apply
- Bedrock
- Preserve
cache_controlfor ARN models in the/v1/messagesadapter - PR #29823 - Handle
role: "system"inside the messages array on/v1/messages- PR #30443 - Use a unique function-call id for Bedrock Mantle responses->chat tool calls - PR #30426
- Add SigV4 fallback to Bedrock Mantle chat completions auth - PR #30714
- Preserve
- Gemini / Vertex AI
- OpenAI-compatible
- WatsonX
- Wrap string embedding input in an array for the WatsonX API - PR #30897
- Pricing / Cost map
- Add cost mapping for
deepseek-v4-flash/deepseek-v4-pro- PR #27056 - Add
mistral-medium-3-5to the cost map - PR #29303 - Add
azure_ai/gpt-5.5to the model cost map - PR #30428 - Add GitHub Copilot MAI Code Flash pricing - PR #30415
- Sync the Fireworks AI model registry with the current platform catalog - PR #30616
- Add
soniox/stt-async-v5- PR #30672 - Correct swapped input/output token costs for
command-r7b-12-2024- PR #30413 - Add 1h cache-write cost for Anthropic Sonnet 4.5/4.6 - PR #30474
- Route Volcengine (Doubao) tiered-pricing models to the tiered cost handler - PR #30357; sort tiered thresholds numerically - PR #30375; treat a DashScope explicit
0.0tier cost as a real price - PR #30653 - Drop synthesized zero costs in
register_modelto preserve sparse entries - PR #30201
- Add cost mapping for
LLM API Endpoints​
Features​
- Responses API
- Propagate
completed_responsethroughFallbackResponsesStreamWrapperfor streaming/v1/responsescontainer ownership - PR #30213
- Propagate
- /v1/models
- Realtime
- Allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes - PR #30089
- Files
- Attach existing OpenAI file ids - PR #30628
Bugs​
- General
- Token counter: handle Anthropic
tool_referenceblocks to stop dropped spend logs - PR #30302 - Streaming: guard
raise_on_model_repetitionagainst empty choices - PR #30485 - Audio: don't override an explicit
response_formatwithverbose_json- PR #30599 - Validate the resolved model in
/realtime/client_secretsfor non-transcription sessions - PR #30710
- Token counter: handle Anthropic
Management Endpoints / UI​
Features​
- App Router migration - models - PR #30677, teams - PR #30343, users - PR #30334, organizations - PR #30336, api-keys - PR #30699, usage report - PR #30694, agents + router-settings - PR #30323
- UI cleanup - remove the unreachable
/chatpage - PR #30178, dead UI components - PR #30340, orphaned pass-through-settings route - PR #30692; remove in-product survey and feedback nudges - PR #30773 - Virtual Keys - expose per-model budget usage in
/key/info- PR #30394; grace-period key rotation returns the deprecated-key lookup result on 401 - PR #30327 - Teams / Orgs - add
key_limitquery param to/team/info- PR #30006; list public team model names in/v1/models- PR #30588 - Proxy CLI Auth - add
verification_uri_completeto the CLI SSO device flow - PR #30571 - Proxy - configurable response headers and login-page hint - PR #30792; gate the "Default Credentials" hint on
/ui/loginbehind an env flag - PR #30234
Bugs​
- Access control / keys
/key/listnow does exactuser_id/key_aliasmatching by default, preventing cross-user key disclosure - PR #30593- Restrict
/customer/daily/activityto admin-only - PR #28849 org_adminsees all org teams when the UI sends its ownuser_id- PR #30247- Allow internal roles to access vector store CRUD routes - PR #30503
- Require premium only when enabling premium metadata fields - PR #30506
- Guard
check_and_fix_namespaceagainst aNonekey - PR #30435 - Warn at startup when
custom_authskipscommon_checksenforcement - PR #30665 - Resolve list-files credentials from team BYOK deployments - PR #30495; preserve
azure_ad_tokenthroughCredentialLiteLLMParamsfor/v1/files+ batches - PR #30241 - Enforce budget for models not in the cost map - PR #24949
- UI
- Stop the Virtual Keys page from an infinite render loop - PR #30397
- Source api-keys identity from
useAuthorizedto stop "User ID is not set" - PR #30903 - Render logos correctly under a custom
server_root_path- PR #31156 - Warn that team models are deleted in the delete-team modal - PR #29990
- Three small fixes - Gemini
api_base, credential form reset, Mode badge - PR #30419 - Repoint the dead usage-guide link to cost-tracking docs - PR #30859
- Proxy
- Support SMTP implicit SSL (port 465) - PR #30395
AI Integrations​
Logging​
- OpenTelemetry
- Emit the six
gen_ai.client.*metrics at v1 parity in v2 - PR #30326 - One v2 logger owns the global provider; scope tenant OTLP creds per exporter - PR #30590
- Export v2 gen_ai client metrics to the configured meter provider - PR #30549
- Stamp
gen_ai.input/output.messageson v2 spans - PR #30548 - Cap metric attribute cardinality with include/exclude lists - PR #30257
- Record the full error message on the standard exception event in v2 - PR #30380
- Accept
UPPER_SNAKE_CASEOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTin v2 - PR #30562
- Emit the six
- General
- Preserve
error_messageonProxyExceptionfailures in spend logs - PR #30381
- Preserve
Guardrails​
- Cisco AI Defense - new integration - PR #28249
- Repello Argus - new integration - PR #30465
- Presidio - add missing UK PII entity types - PR #30537; don't mask the live request when the guardrail is
logging_only- PR #30461 - AIM - return 400 not 500 when AIM blocks a request - PR #30573
- General
Secret Managers​
- AWS Secrets Manager - cross-region replication - PR #30368
Spend Tracking, Budgets and Rate Limiting​
- Service-tier pricing - apply the
service_tiersuffix to above-threshold cache rates and expose priority+threshold keys inModelInfo- PR #30450; price and surface the Anthropic responseservice_tierin cost tracking - PR #30558; stop non-stringservice_tierfrom silently dropping cost tracking - PR #30690, PR #30706 - Budgets - enforce budgets against authoritative DB spend when the cross-pod counter is stale - PR #30684; release a budget reservation when a request is cancelled mid-flight - PR #30522; recompute
budget_reset_atwhenbudget_durationchanges - PR #30555 - Rate limiting - prevent internal
parallel_request_limiterfields from leaking to upstream providers - PR #30545 - Spend accuracy - record partial spend on the failure row for interrupted streams - PR #30788; recover output tokens for interrupted Anthropic streams - PR #30787; stop Perplexity double-billing reasoning tokens in the manual cost fallback - PR #30488; correct cached-token usage with
ChatCompletionUsageBlock- PR #30422 - Usage aggregation - drain all daily-spend batches per flush cycle - PR #30505; show session-aggregate cost and duration in request logs - PR #30507; coalesce null aggregates for no-spend keys - PR #29945; remove timezone date expansion in daily-activity aggregation - PR #29569
MCP Gateway​
- Make the MCP gateway name and description configurable via env vars - PR #30473
- Fail closed when the scope filter resolves to no servers - PR #30353
- Re-raise instead of silently dropping MCP team permissions - PR #30477
- Drop the phantom 401 span on delegated OAuth2 tool calls - PR #30494
- Challenge delegate-auth OAuth servers with the upstream
resource_metadata- PR #31255 - Default the Linear MCP registry entry to streamable HTTP - PR #30396
- Preserve native tools in the semantic filter hook - PR #26650
Performance / Loadbalancing / Reliability improvements​
- Streaming connection hygiene - cancel the upstream Gemini request and release the httpx connection on client disconnect - PR #30075; close the upstream LLM stream when the client disconnects mid-stream - PR #30245; release the aiohttp connection when stream iteration ends abnormally - PR #30271; use
e.request_dataforlogging_objinModifyResponseExceptionstreaming passthrough - PR #30800 - Caching - add a valkey-semantic cache backend and fix semantic-cache scope keys - PR #30675; url-encode the object name in the GCS cache GET path - PR #30378; allow
use_redis_transaction_bufferwithout a Redis cache - PR #28764 - Router / fallbacks - resolve a list-unhashable crash on model alias - PR #30464; clean pattern_router state on upsert/delete - PR #29601; preserve the fallback model in SDK fallback responses - PR #28260; add
expose_router_debug_in_errors(default True) to redact internal model_group/fallback names - PR #30418 - Startup / workers - fail fast on a non-PostgreSQL
DATABASE_URLinstead of hanging - PR #30366; add--max_requests_before_restart_jitterto stagger worker restarts - PR #30601; fix the IAM refresh-engine watcher race - PR #30183; release the cron pod-lock by matchingasync_set_cacheJSON encoding - PR #30600 - Health checks - correct Bedrock embedding health checks - PR #30583; bump the health-check
max_tokensdefault to 16 for GPT-5 compatibility - PR #30708, PR #26610 - Developer experience / CI - around 30 PRs hardening the lint and type-check gates (standardizing on basedpyright, dropping mypy, ratcheting any-discipline budgets), an osv-scanner lockfile workflow, zizmor PR gating, a local fake-OpenAI test endpoint replacing the shared mock, dependency bumps, and a pinned build toolchain.
Documentation Updates​
- Add 1-click AWS/GCP Terraform deploy buttons and fix README deploy-button rendering - PR #29879
- Strengthen the coding conventions in
CLAUDE.md- PR #30333 - Clarify the Linear portion of the PR template - PR #30766
New Contributors​
@hannahmadison, @ayushh0110, @Dotify71, @munnr, @V-3604, @yrk111222, @Silvenga, @djmaze, @apshada, @HumphreySun98, @Harshxth, @tomoyat1, @S0ngRu1, @habonlaci, @moshemalawach, @nahrinoda, @Vedant-Agarwal, @lollinng, @anneheartrecord, @hdt12a1, @vineethsaivs, @krishvsoni, @rvishwas26, @santino18727-debug, @darktheorys, @songkuan-zheng, @Thijmen, @Kropiunig, @jay-tau, @KnyazSh, @koztkozt, @us, @Anuj7411, @zkryakgul, @lavish619, @EugeneLugovtsov, @Bochenski, @menardorama, @factnn, @semmons99, @nitishagar, @FadelT, @jho1-godaddy, @yucheng-berri, @ad1269, @shzdehmd, @vanika02, @Nithish-Yenaganti, @simantak-dabhade, @devYRPauli, @clpatterson, @tcconnally