Skip to main content

v1.90.0 - Six New Providers, OpenTelemetry v2 Parity & Streaming Reliability

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.90.0

Key Highlights​

  • Six new providers - ModelScope, LibertAI, Parasail, Pinstripes, TinyFish (search), and FastCRW (search) - plus a new e2b code-execution sandbox primitive.
  • 91 new models across Fireworks AI, Scaleway, Tensormesh, LibertAI, Azure AI (including gpt-5.5 and DeepSeek V4), and Bedrock Mantle.
  • OpenTelemetry v2 reaches metrics parity with v1, emitting the six gen_ai.client.* metrics, stamping input/output message content, and scoping OTLP credentials per tenant.
  • A broad streaming-reliability sweep: upstream connections are now released when the client disconnects mid-stream (Gemini, aiohttp), requests are cancelled cleanly, and partial spend is recorded on interrupted streams.
  • Two new guardrails (Cisco AI Defense, Repello Argus) and a large Next.js App Router UI migration covering the models, teams, users, organizations, api-keys, and usage pages.

App Router Routing​

We're moving the Admin UI from query param based routing to the Nextjs App Router. The motivation is that routing now lives in the URL, so any view (a specific team, a filtered usage report, a single key) becomes a shareable link you can send to a teammate or bookmark instead of a piece of in-memory client state.

The motivation for this is twofold: it lays the groundwork for a lot of highly requested feaetures/improvements to the UI, and it lays the groundwork for contributing to LiteLLM easier and code more human reviewable for maintainers.

The biggest one here will be sharing links for different pages like specific logs pages, teams pages, and more.

New Providers and Endpoints​

New Providers (6 new providers)​

ProviderSupported LiteLLM EndpointsDescription
ModelScope (modelscope)Chat CompletionsOpenAI-compatible provider for ModelScope-hosted models - PR #28460
LibertAI (libertai)Chat Completions, EmbeddingsJSON-configured OpenAI-compatible provider; ships 12 catalog models including bge-m3 embeddings - PR #30203
TinyFish (tinyfish)SearchWeb search provider - PR #30634
FastCRW (fastcrw)SearchWeb search provider - PR #30434
Parasail (parasail)Chat CompletionsOpenAI-compatible provider
Pinstripes (pinstripes)Chat CompletionsNew chat provider; ships 6 catalog models

New LLM API Endpoints​

CapabilityDescriptionDocumentation
Code execution (e2b)New sandbox / code-interpreter primitive for running model-generated code - PR #30898Sandbox

New Models / Updated Models​

New Model Support (91 new models)​

ProviderModelContextInput ($/1M)Output ($/1M)Features
Azure AIazure_ai/gpt-5.51,050,000$5$30reasoning, function calling, prompt caching, pdf, vision
Azure AIazure_ai/gpt-5.5-2026-04-231,050,000$5$30reasoning, function calling, prompt caching, pdf, vision
Azure AIazure_ai/deepseek-v4-flash1,000,000$0.19$0.51reasoning, function calling
Azure AIazure_ai/deepseek-v4-pro1,000,000$1.74$3.48reasoning, function calling
Azure AIazure_ai/deepseek-v3.1131,072$1.23$4.94reasoning, function calling
Azure AIazure_ai/MAI-Image-2.5-$5-image generation
Azure AIazure_ai/MAI-Image-2.5-Flash-$1.75-image generation
Azure AIazure_ai/MAI-Image-2e-$5-image generation
Azureazure/gpt-realtime-whisper---audio transcription
OpenAIgpt-realtime-whisper---audio transcription
DeepSeekdeepseek-v4-flash / deepseek/deepseek-v4-flash1,000,000$0.14$0.28function calling, prompt caching
DeepSeekdeepseek-v4-pro / deepseek/deepseek-v4-pro1,000,000$0.43$0.87function calling, prompt caching
Mistralmistral/mistral-medium-3-5262,144$1.50$7.50function calling, vision
GitHub Copilotgithub_copilot/mai-code-1-flash128,000$0.75$4.50function calling
Fireworks AI24 models incl. deepseek-v4-pro, glm-5p2, kimi-k2p6/kimi-k2p7-code, minimax-m3, qwen3p7-plus, gpt-oss-120b/gpt-oss-20bup to 1,048,576$0.07-$2.80$0.28-$8.80function calling, reasoning, vision
Bedrock Mantlebedrock_mantle/google.gemma-4-26b-a4b / gemma-4-31b / gemma-4-e2b128k-256k$0.04-$0.14$0.08-$0.40function calling, reasoning, vision
LibertAI12 models incl. qwen3.6-35b-a3b(-thinking), gemma-4-31b-it(-thinking), deepseek-v4-flash, bge-m3up to 262,144$0.01-$0.25free-$1.75function calling, reasoning, vision, embedding
Pinstripes6 models incl. ps/minimax-m2.7, ps/qwen3.6-35b-a3b, ps/glm-4.5-air, ps/deepseek-v4-flashup to 1,000,192$0.09-$0.30$0.20-$0.60function calling, reasoning
Scaleway17 models incl. qwen3.5-397b-a17b, mistral-medium-3.5-128b, gemma-4-26b-a4b-it, gpt-oss-120b, whisper-large-v3up to 256,000free-$1.50free-$7.50function calling, reasoning, vision, audio, embedding
Tensormesh10 models incl. Qwen3-Coder-480B-A35B-FP8, Qwen3.5-397B-A17B-FP8, Kimi-K2.6, DeepSeek-V4-Flash, gpt-oss-120b/gpt-oss-20bup to 262,144$0.07-$1.40$0.28-$4.40function calling, reasoning, prompt caching
Sonioxsoniox/stt-async-v58,000--audio transcription
TinyFishtinyfish/search---search

The 91 new entries also include the full fireworks_ai/accounts/... model and router paths. Claude Fable 5 already shipped in v1.89.0, so it is not counted here. Full diff: model_prices_and_context_window.json.

Features​

  • Anthropic
    • Surface compaction usage iterations data - PR #27065
    • Serve Anthropic-native /v1/models for Claude Code gateway discovery - PR #30273
  • OpenRouter
    • Map reasoning max level to xhigh - PR #28881
  • Bedrock
    • Optionally forward multimodal content blocks in AgentCore InvokeAgentRuntime - PR #28885
    • Support file content retrieval for batch output files - PR #30595
    • Make Bedrock Mantle Responses routing data-driven per model - PR #30700
  • DashScope
  • OCI
    • Make Cohere {{trace}} judges work (tool param types + agentic tool-calling continuation) - PR #30646

Bug Fixes​

  • Anthropic
    • Apply cache_control_injection_points on the /v1/messages path - PR #30341
    • Strip LiteLLM-injected total_tokens from /v1/messages responses - PR #30382
    • Cap cache_control injection at 4 blocks - PR #30480
    • Drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients - PR #30486
    • Don't leak tool type into OpenAI function parameters schema - PR #30618
  • Bedrock
    • Preserve cache_control for ARN models in the /v1/messages adapter - PR #29823
    • Handle role: "system" inside the messages array on /v1/messages - PR #30443
    • Use a unique function-call id for Bedrock Mantle responses->chat tool calls - PR #30426
    • Add SigV4 fallback to Bedrock Mantle chat completions auth - PR #30714
  • Gemini / Vertex AI
    • Use get_vertex_base_url for cachedContents host - PR #29707
    • Buffer native Gemini SSE frames - PR #30225
    • Map Gemini upstream-error body code 429 to RateLimitError - PR #30417
    • Ensure checks show gemini-3-flash-preview supports responseJsonSchema - PR #30696
  • OpenAI-compatible
    • Preserve cache_control for OpenAI-compatible custom endpoints - PR #30387
    • hosted_vllm: remove thinking_blocks and convert list content to strings - PR #30475
    • Don't stack provider prefix on wildcard models with a custom prefix - PR #30360
  • WatsonX
    • Wrap string embedding input in an array for the WatsonX API - PR #30897
  • Pricing / Cost map
    • Add cost mapping for deepseek-v4-flash/deepseek-v4-pro - PR #27056
    • Add mistral-medium-3-5 to the cost map - PR #29303
    • Add azure_ai/gpt-5.5 to the model cost map - PR #30428
    • Add GitHub Copilot MAI Code Flash pricing - PR #30415
    • Sync the Fireworks AI model registry with the current platform catalog - PR #30616
    • Add soniox/stt-async-v5 - PR #30672
    • Correct swapped input/output token costs for command-r7b-12-2024 - PR #30413
    • Add 1h cache-write cost for Anthropic Sonnet 4.5/4.6 - PR #30474
    • Route Volcengine (Doubao) tiered-pricing models to the tiered cost handler - PR #30357; sort tiered thresholds numerically - PR #30375; treat a DashScope explicit 0.0 tier cost as a real price - PR #30653
    • Drop synthesized zero costs in register_model to preserve sparse entries - PR #30201

LLM API Endpoints​

Features​

  • Responses API
    • Propagate completed_response through FallbackResponsesStreamWrapper for streaming /v1/responses container ownership - PR #30213
  • /v1/models
    • Surface max_input_tokens/max_output_tokens on /v1/models - PR #30272
    • Include model group aliases in v1 model info - PR #30626
  • Realtime
    • Allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes - PR #30089
  • Files
    • Attach existing OpenAI file ids - PR #30628

Bugs​

  • General
    • Token counter: handle Anthropic tool_reference blocks to stop dropped spend logs - PR #30302
    • Streaming: guard raise_on_model_repetition against empty choices - PR #30485
    • Audio: don't override an explicit response_format with verbose_json - PR #30599
    • Validate the resolved model in /realtime/client_secrets for non-transcription sessions - PR #30710

Management Endpoints / UI​

Features​

  • App Router migration - models - PR #30677, teams - PR #30343, users - PR #30334, organizations - PR #30336, api-keys - PR #30699, usage report - PR #30694, agents + router-settings - PR #30323
  • UI cleanup - remove the unreachable /chat page - PR #30178, dead UI components - PR #30340, orphaned pass-through-settings route - PR #30692; remove in-product survey and feedback nudges - PR #30773
  • Virtual Keys - expose per-model budget usage in /key/info - PR #30394; grace-period key rotation returns the deprecated-key lookup result on 401 - PR #30327
  • Teams / Orgs - add key_limit query param to /team/info - PR #30006; list public team model names in /v1/models - PR #30588
  • Proxy CLI Auth - add verification_uri_complete to the CLI SSO device flow - PR #30571
  • Proxy - configurable response headers and login-page hint - PR #30792; gate the "Default Credentials" hint on /ui/login behind an env flag - PR #30234

Bugs​

  • Access control / keys
    • /key/list now does exact user_id/key_alias matching by default, preventing cross-user key disclosure - PR #30593
    • Restrict /customer/daily/activity to admin-only - PR #28849
    • org_admin sees all org teams when the UI sends its own user_id - PR #30247
    • Allow internal roles to access vector store CRUD routes - PR #30503
    • Require premium only when enabling premium metadata fields - PR #30506
    • Guard check_and_fix_namespace against a None key - PR #30435
    • Warn at startup when custom_auth skips common_checks enforcement - PR #30665
    • Resolve list-files credentials from team BYOK deployments - PR #30495; preserve azure_ad_token through CredentialLiteLLMParams for /v1/files + batches - PR #30241
    • Enforce budget for models not in the cost map - PR #24949
  • UI
    • Stop the Virtual Keys page from an infinite render loop - PR #30397
    • Source api-keys identity from useAuthorized to stop "User ID is not set" - PR #30903
    • Render logos correctly under a custom server_root_path - PR #31156
    • Warn that team models are deleted in the delete-team modal - PR #29990
    • Three small fixes - Gemini api_base, credential form reset, Mode badge - PR #30419
    • Repoint the dead usage-guide link to cost-tracking docs - PR #30859
  • Proxy
    • Support SMTP implicit SSL (port 465) - PR #30395

AI Integrations​

Logging​

  • OpenTelemetry
    • Emit the six gen_ai.client.* metrics at v1 parity in v2 - PR #30326
    • One v2 logger owns the global provider; scope tenant OTLP creds per exporter - PR #30590
    • Export v2 gen_ai client metrics to the configured meter provider - PR #30549
    • Stamp gen_ai.input/output.messages on v2 spans - PR #30548
    • Cap metric attribute cardinality with include/exclude lists - PR #30257
    • Record the full error message on the standard exception event in v2 - PR #30380
    • Accept UPPER_SNAKE_CASE OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT in v2 - PR #30562
  • General
    • Preserve error_message on ProxyException failures in spend logs - PR #30381

Guardrails​

  • Cisco AI Defense - new integration - PR #28249
  • Repello Argus - new integration - PR #30465
  • Presidio - add missing UK PII entity types - PR #30537; don't mask the live request when the guardrail is logging_only - PR #30461
  • AIM - return 400 not 500 when AIM blocks a request - PR #30573
  • General
    • Stop re-initializing DB guardrails on every poll - PR #30542
    • Run the pre_call hook once for model-level guardrails - PR #30543
    • disable_global_guardrails overrides the team list - PR #28563
    • Surface OpenAI moderation violation_categories on guardrail traces - PR #30659

Secret Managers​

Spend Tracking, Budgets and Rate Limiting​

  • Service-tier pricing - apply the service_tier suffix to above-threshold cache rates and expose priority+threshold keys in ModelInfo - PR #30450; price and surface the Anthropic response service_tier in cost tracking - PR #30558; stop non-string service_tier from silently dropping cost tracking - PR #30690, PR #30706
  • Budgets - enforce budgets against authoritative DB spend when the cross-pod counter is stale - PR #30684; release a budget reservation when a request is cancelled mid-flight - PR #30522; recompute budget_reset_at when budget_duration changes - PR #30555
  • Rate limiting - prevent internal parallel_request_limiter fields from leaking to upstream providers - PR #30545
  • Spend accuracy - record partial spend on the failure row for interrupted streams - PR #30788; recover output tokens for interrupted Anthropic streams - PR #30787; stop Perplexity double-billing reasoning tokens in the manual cost fallback - PR #30488; correct cached-token usage with ChatCompletionUsageBlock - PR #30422
  • Usage aggregation - drain all daily-spend batches per flush cycle - PR #30505; show session-aggregate cost and duration in request logs - PR #30507; coalesce null aggregates for no-spend keys - PR #29945; remove timezone date expansion in daily-activity aggregation - PR #29569

MCP Gateway​

  • Make the MCP gateway name and description configurable via env vars - PR #30473
  • Fail closed when the scope filter resolves to no servers - PR #30353
  • Re-raise instead of silently dropping MCP team permissions - PR #30477
  • Drop the phantom 401 span on delegated OAuth2 tool calls - PR #30494
  • Challenge delegate-auth OAuth servers with the upstream resource_metadata - PR #31255
  • Default the Linear MCP registry entry to streamable HTTP - PR #30396
  • Preserve native tools in the semantic filter hook - PR #26650

Performance / Loadbalancing / Reliability improvements​

  • Streaming connection hygiene - cancel the upstream Gemini request and release the httpx connection on client disconnect - PR #30075; close the upstream LLM stream when the client disconnects mid-stream - PR #30245; release the aiohttp connection when stream iteration ends abnormally - PR #30271; use e.request_data for logging_obj in ModifyResponseException streaming passthrough - PR #30800
  • Caching - add a valkey-semantic cache backend and fix semantic-cache scope keys - PR #30675; url-encode the object name in the GCS cache GET path - PR #30378; allow use_redis_transaction_buffer without a Redis cache - PR #28764
  • Router / fallbacks - resolve a list-unhashable crash on model alias - PR #30464; clean pattern_router state on upsert/delete - PR #29601; preserve the fallback model in SDK fallback responses - PR #28260; add expose_router_debug_in_errors (default True) to redact internal model_group/fallback names - PR #30418
  • Startup / workers - fail fast on a non-PostgreSQL DATABASE_URL instead of hanging - PR #30366; add --max_requests_before_restart_jitter to stagger worker restarts - PR #30601; fix the IAM refresh-engine watcher race - PR #30183; release the cron pod-lock by matching async_set_cache JSON encoding - PR #30600
  • Health checks - correct Bedrock embedding health checks - PR #30583; bump the health-check max_tokens default to 16 for GPT-5 compatibility - PR #30708, PR #26610
  • Developer experience / CI - around 30 PRs hardening the lint and type-check gates (standardizing on basedpyright, dropping mypy, ratcheting any-discipline budgets), an osv-scanner lockfile workflow, zizmor PR gating, a local fake-OpenAI test endpoint replacing the shared mock, dependency bumps, and a pinned build toolchain.

Documentation Updates​

  • Add 1-click AWS/GCP Terraform deploy buttons and fix README deploy-button rendering - PR #29879
  • Strengthen the coding conventions in CLAUDE.md - PR #30333
  • Clarify the Linear portion of the PR template - PR #30766

New Contributors​

@hannahmadison, @ayushh0110, @Dotify71, @munnr, @V-3604, @yrk111222, @Silvenga, @djmaze, @apshada, @HumphreySun98, @Harshxth, @tomoyat1, @S0ngRu1, @habonlaci, @moshemalawach, @nahrinoda, @Vedant-Agarwal, @lollinng, @anneheartrecord, @hdt12a1, @vineethsaivs, @krishvsoni, @rvishwas26, @santino18727-debug, @darktheorys, @songkuan-zheng, @Thijmen, @Kropiunig, @jay-tau, @KnyazSh, @koztkozt, @us, @Anuj7411, @zkryakgul, @lavish619, @EugeneLugovtsov, @Bochenski, @menardorama, @factnn, @semmons99, @nitishagar, @FadelT, @jho1-godaddy, @yucheng-berri, @ad1269, @shzdehmd, @vanika02, @Nithish-Yenaganti, @simantak-dabhade, @devYRPauli, @clpatterson, @tcconnally

Full Changelog​

v1.89.0...v1.90.0