Skip to main content

[Preview] v1.82.0 - Realtime Guardrails, Projects Management, and 10+ Performance Optimizations

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-1.82.0

Key Highlights​

v1/messages routing change

This version starts routing /v1/messages requests to the /responses API by default. To opt out and continue using chat/completions, set LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true or litellm_settings.use_chat_completions_url_for_anthropic_messages: true in your config.


New Models / Updated Models​

New Model Support (20 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenAIgpt-5.3-codex272K$1.75$14.00Reasoning, coding
Azure OpenAIazure/gpt-5.3-codex272K$1.75$14.00Azure deployment
OpenAIgpt-audio-1.5128K$2.50$10.00Audio model
Azure OpenAIazure/gpt-audio-1.5-2026-02-23128K$2.50$10.00Audio model
OpenAIgpt-realtime-1.532K$4.00$16.00Realtime model
Azure OpenAIazure/gpt-realtime-1.5-2026-02-2332K$4.00$16.00Realtime model
Groqgroq/openai/gpt-oss-safeguard-20b131K$0.075$0.30Guardrail inference
Google Vertex AIvertex_ai/gemini-3.1-flash-image-preview---Image generation
Perplexityperplexity/perplexity/sonar---Sonar search
Perplexityperplexity/openai/gpt-5.1---Hosted routing
Perplexityperplexity/openai/gpt-5-mini---Hosted routing
Perplexityperplexity/google/gemini-2.5-flash---Hosted routing
Perplexityperplexity/google/gemini-2.5-pro---Hosted routing
Perplexityperplexity/google/gemini-3-flash-preview---Hosted routing
Perplexityperplexity/google/gemini-3-pro-preview---Hosted routing
Perplexityperplexity/anthropic/claude-haiku-4-5---Hosted routing
Perplexityperplexity/anthropic/claude-sonnet-4-5---Hosted routing
Perplexityperplexity/anthropic/claude-opus-4-5---Hosted routing
Perplexityperplexity/anthropic/claude-opus-4-6---Hosted routing
Perplexityperplexity/xai/grok-4-1-fast-non-reasoning---Hosted routing

Features​

  • OpenAI

    • Day 0 support for gpt-5.3-codex on OpenAI and Azure - PR #22035
    • Add gpt-audio-1.5 model cost map - PR #22303
    • Add gpt-realtime-1.5 model cost map - PR #22304
    • Add audio as supported OpenAI param - PR #22092
    • Add prompt_cache_key and prompt_cache_retention support - PR #20397
  • Azure OpenAI

    • New Azure OpenAI models 2026-02-25 - PR #22114
  • Anthropic

    • Add v1 Anthropic Responses API transformation - PR #22087
    • Sanitize tool_use IDs in convert_to_anthropic_tool_invoke - PR #21964
    • Fix model wildcard access issue - PR #21917
  • AWS Bedrock

    • Encode model ARNs for OpenAI-compatible Bedrock imported models - PR #21701
    • Support optional regional STS endpoint in role assumption - PR #21640
    • Native structured outputs API support - PR #21222
  • Google Vertex AI

    • Add gemini-3.1-flash-image-preview to model cost map - PR #22223
    • Enable context-1m-2025-08-07 beta header for Vertex AI provider - PR #21867
  • OpenRouter

    • Add OpenRouter native models to model cost map - PR #20520
    • Add OpenRouter Opus 4.6 to model map - PR #20525
  • Mistral

    • Adjust mistral-small-2503 input/output cost per token - PR #22097
  • Groq

    • Add groq/openai/gpt-oss-safeguard-20b model pricing - PR #21951
  • AI/ML

  • Ollama

    • Thread api_base to get_model_info + graceful fallback - PR #21970
  • PublicAI

    • Fix function calling for PublicAI Apertus models - PR #21582
  • xAI

    • Add deprecation dates for grok-2-vision-1212 and grok-3-mini models - PR #20102
  • General

    • Forward auth headers of provider - PR #22070
    • Normalize camelCase thinking param keys to snake_case - PR #21762
    • Allow dimensions param passthrough for non-text-embedding-3 OpenAI models - PR #22144

Bug Fixes​

  • AWS Bedrock

    • Fix converse handling for parallel_tool_calls - PR #22267
    • Restore parallel_tool_calls mapping in map_openai_params - PR #22333
    • Correct modelInput format for Converse API batch models - PR #21656
    • Prevent double UUID in create_file S3 key - PR #21650
    • Filter internal json_tool_call when mixed with real tools - PR #21107
    • Pass timeout param to Bedrock rerank HTTP client - PR #22021
  • Anthropic

    • Fix model cost map for anthropic fast and inference_geo - PR #21904
  • Image Generation

    • Propagate extra_headers to upstream image generation - PR #22026
    • Add ChatCompletionImageObject in OpenAIChatCompletionAssistantMessage - PR #22155
  • General

    • Preserve forwarding of server-side called tools - PR #22260
    • Fix free model handling from UI paths - PR #22258
    • Fix None TypeError in mapping - PR #22080

LLM API Endpoints​

Features​

  • Realtime API

    • Guardrails support for /v1/realtime WebSocket endpoint - PR #22152
    • Vertex AI Gemini Live via unified /realtime endpoint - PR #22153
    • Guardrails with pre_call/post_call mode on realtime WebSocket - PR #22161
    • end_session_after_n_fails + Endpoint Settings wizard step - PR #22165
    • Guardrail hook for voice transcription - PR #21976
    • Fix guardrails not firing for Gemini/Vertex AI and provider_config realtime sessions - PR #22168
    • Add logging, spend tracking support + tool tracing - PR #22105
  • Video Generation

    • Add variant parameter to video content download - PR #21955
    • Pass api_key from litellm_params to video remix handlers - PR #21965
    • Apply custom video pricing from deployment model_info - PR #21923
    • Fix passing of image and parameters in videos API - PR #22170
  • OCR

    • Enable local file support for OCR - PR #22133
  • Websearch / Tool Calling

    • Preserve thinking blocks in agentic loop follow-up messages - PR #21604
  • General

    • Add configurable upper bound for chunk processing time - PR #22209
    • Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Bugs​

  • General
    • Fix mypy attr-defined errors on realtime websocket calls - PR #22202

Management Endpoints / UI​

Features​

  • Projects

    • Add Projects page with list and create flows - PR #22315
    • Add Project Details page with edit modal - PR #22360
    • Add project keys table and project dropdown on key create/edit - PR #22373
    • Add delete project action to Projects table - PR #22412
    • Add Projects Opt-In Toggle in Admin Settings - PR #22416
    • Include created_at and updated_at in /project/list response - PR #22323
    • Add tags in project - PR #22216
  • Virtual Keys + Access Groups

    • Add bidirectional team/key sync for Access Group CRUD flows - PR #22253
    • Add pagination and search to /key/aliases to prevent OOMs - PR #22137
    • Add paginated key alias selector in UI - PR #22157
    • Add project_id and access_group_id filters for key list endpoint - PR #22356
    • Add KeyInfoHeader component - PR #22047
    • Restrict Edit Settings to key owners - PR #21985
    • Fix virtual key grace period from env/UI - PR #20321
  • Agents

    • Assign virtual keys to agents - PR #22045
    • Assign tools to agents - PR #22064
    • Ensure internal users cannot create agents (RBAC enforcement) - PR #22329
  • Proxy Auth / SSO

    • OIDC discovery URLs, roles array handling, and dot-notation error hints - PR #22336
    • Add PROXY_ADMIN role to system user for key rotation - PR #21896
  • Usage / Spend Logs

    • Add user filtering to usage page - PR #22059
    • Allow using AI to understand usage patterns - PR #22042
    • Use backend request_duration_ms and make Duration sortable in Logs - PR #22122
    • Add request_duration_ms to SpendLogs - PR #22066
    • Enrich failure spend logs with key/team metadata - PR #22049
    • Show real tool names in logs for Anthropic-format tools - PR #22048
  • Models + Endpoints

    • Show proxy URL in ModelHub - PR #21660
    • Add /public/endpoints for provider endpoint support - PR #22248
  • UI Improvements

    • Add custom favicon support - PR #21653
    • Add Blog Dropdown in Navbar - PR #21859
    • Add UI banner warning for detailed debug mode - PR #21527
    • Make auth value optional for MCP Server create flow - PR #22119
    • Tool policies: auto-discover tools + policy enforcement guardrail - PR #22041
  • Health Checks

    • Add health check max tokens configuration - PR #22299
    • Limit concurrent health checks with health_check_concurrency - PR #20584
    • Fix health check model_id filtering - PR #21071

Bugs​

  • Populate user_id and user_info for admin users in /user/info - PR #22239
  • Fix virtual keys pagination stale totals when filtering - PR #22222
  • Fix Spend Update Queue aggregation never triggers with default presets - PR #21963
  • Fix timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
  • Fix custom auth budget issue - PR #22164
  • Fix missing OAuth session state - PR #21992
  • Fix Transport Type for OpenAPI Spec on UI - PR #22005
  • Fix Claude Code plugin schema - PR #22271
  • Add missing migration for LiteLLM_ClaudeCodePluginTable - PR #22335
  • Only tag selected deployment in access group creation - PR #21655
  • State management fixes for CheckBatchCost - PR #21921
  • Remove duplicate antd import in ToolPolicies - PR #22107

AI Integrations​

Logging​

Guardrails​

  • Noma

    • Noma guardrails v2 based on custom guardrails framework - PR #21400
  • LakeraAI

    • Add Lakera v2 post-call hook with fixed PII masking - PR #21783
  • Presidio

    • Fix Presidio streaming and false positives - PR #21949
    • Fix Presidio streaming v3 reliability improvements - PR #22283
    • Prevent Presidio crash on non-JSON responses - PR #22084
  • Built-in Guardrails

    • Block code execution guardrail to prevent agents from executing code - PR #22154
    • Employment discrimination topic blockers for 5 protected classes - PR #21962
    • Claims agent guardrails (5 categories + policy template) - PR #22113
    • New code execution evaluation dataset - PR #22065
    • Tool policies: auto-discover tools + policy enforcement - PR #22041
  • Policy Templates

    • Singapore guardrail policies (PDPA + MAS AI Risk Management) - PR #21948
    • Prefix SG guardrail policy IDs with country code - PR #21974
    • Guardrail policy versioning - PR #21862
  • Guardrail Monitoring

    • Guardrail Monitor — measure guardrail reliability in production - PR #21944
  • Security

    • Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095

Prompt Management​

No major prompt management changes in this release.

Secret Managers​

No major secret manager changes in this release.


Spend Tracking, Budgets and Rate Limiting​

  • Priority PayGo cost tracking for Gemini/Vertex AI - PR #21909
  • Add request_duration_ms to SpendLogs for latency tracking per request - PR #22066
  • Add in_flight_requests metric to /health/backlog + Prometheus - PR #22319
  • Enrich failure spend logs with key/team metadata - PR #22049
  • Add spend tracking lifecycle logging for debugging spend flows - PR #22029
  • Fix budget timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
  • Fix Spend Update Queue aggregation never triggering with default presets - PR #21963
  • Avoid mutating caller-owned dicts in SpendUpdateQueue aggregation - PR #21742
  • Optimize old spendlog deletion cron job - PR #21930
  • Health check max tokens configuration - PR #22299

MCP Gateway​

  • Pass MCP auth headers from request context to tool fetch for /v1/responses and /chat/completions - PR #22291
  • Default available_on_public_internet to true for MCP server behavior consistency - PR #22331
  • Clear error messages for IP filtering / no available tools - PR #22142
  • Strip stale mcp-session-id header to prevent 400 errors across proxy workers - PR #21417
  • Skip health check for MCP with passthrough token auth - PR #21982
  • Fix missing OAuth session state - PR #21992
  • Fix Transport Type for OpenAPI Spec on UI - PR #22005
  • Add e2e test for stateless StreamableHTTP behavior - PR #22033

Performance / Loadbalancing / Reliability improvements​

Streaming & hot-path

  • Streaming latency improvements — 4 targeted hot-path fixes - PR #22346
  • Skip throwaway Usage() construction in ModelResponse.__init__ - PR #21611
  • Optimize is_model_o_series_model with startswith - PR #21690
  • Use cached _safe_get_request_headers instead of per-request construction - PR #21430
  • Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Database & Redis

  • Batch 11 create_task() calls into 1 in update_database() - PR #22028
  • Redis pipeline spend updates for batched writes - PR #22044
  • Recover from prisma-query-engine zombie process - PR #21899
  • Optimize old spendlog deletion cron job - PR #21930

Router & caching

  • Add cache invalidation for _cached_get_model_group_info - PR #20376
  • Remove cache eviction close that kills in-use httpx clients - PR #22247
  • Store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings - PR #22143
  • Fix ensure_arrival_time set before calculating queue time - PR #21918

Connection management

  • Only set enable_cleanup_closed on aiohttp when required - PR #21897
  • Prometheus child_exit cleanup for gunicorn workers - PR #22324
  • Prometheus multiprocess cleanup - PR #22221
  • Limit concurrent health checks with health_check_concurrency - PR #20584
  • Isolate get_config failures from model sync loop - PR #22224

Other

  • Semantic cache: support configurable vector dimensions - PR #21649
  • Honor MAX_STRING_LENGTH_PROMPT_IN_DB from config env vars - PR #22106
  • Enhance MidStreamFallbackError to preserve original status code and attributes - PR #22225
  • Network mock utility for testing - PR #21942
  • Add missing return type annotations to iterator protocol methods in streaming_handler - PR #21750

Security​

  • Fix critical/high CVEs in OS-level libs and NPM transitive dependencies - PR #22008
  • Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095
  • Remove hardcoded base64 string flagged by secret scanner - PR #22125

Documentation Updates​

  • Add OpenAI Agents SDK tutorial with LiteLLM Proxy - PR #21221
  • Add OpenClaw integration tutorial - PR #21605
  • Add Google GenAI SDK tutorial (JS & Python) - PR #21885
  • Add Gollem Go agent framework cookbook example - PR #21747
  • Update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway - PR #21130
  • Add store_model_in_db release docs - PR #21863
  • Add Credential Usage Tracking docs - PR #22112
  • Add proxy request tags docs - PR #22129
  • Add trailing slash to /mcp endpoint URLs - PR #20509
  • Add pre-PR checklist to UI contributing guide - PR #21886
  • Replace Azure OpenAI key with mock key in docs - PR #21997
  • Add performance & reliability section to v1.81.14 release notes - PR #21950
  • Update v1.81.12-stable release notes to point to stable.1 - PR #22036
  • Add security vulnerability scan report to v1.81.14 release notes - PR #22385

New Contributors​

  • @janfrederickk made their first contribution in PR #21660
  • @hztBUAA made their first contribution in PR #21656
  • @LeeJuOh made their first contribution in PR #21754
  • @WhoisMonesh made their first contribution in PR #21750
  • @trevorprater made their first contribution in PR #21747
  • @edwiniac made their first contribution in PR #21870
  • @stakeswky made their first contribution in PR #21867
  • @ta-stripe made their first contribution in PR #21701
  • @ron-zhong made their first contribution in PR #21948
  • @Arindam200 made their first contribution in PR #21221
  • @Canvinus made their first contribution in PR #21964
  • @nicolopignatelli made their first contribution in PR #21951
  • @MarshHawk made their first contribution in PR #20584
  • @gavksingh made their first contribution in PR #22106
  • @roni-frantchi made their first contribution in PR #22090
  • @noahnistler made their first contribution in PR #22133
  • @dylan-duan-aai made their first contribution in PR #21130
  • @rasmi made their first contribution in PR #22322

Diff Summary​

02/28/2026​

  • New Models / Updated Models: 26
  • LLM API Endpoints: 14
  • Management Endpoints / UI: 38
  • AI Integrations: 25
  • Spend Tracking, Budgets and Rate Limiting: 10
  • MCP Gateway: 8
  • Performance / Loadbalancing / Reliability improvements: 22
  • Security: 3
  • Documentation Updates: 14

Full Changelog​

v1.81.14.rc.1...v1.82.0