v1.82.0 - Realtime Guardrails, Projects Management, and 10+ Performance Optimizations

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-1.82.0-stable

pip install litellm
pip install litellm==1.82.0

Key Highlights

Realtime API guardrails — Full guardrails support for /v1/realtime WebSocket sessions with pre/post-call enforcement, voice transcription hooks, session termination policies, and Vertex AI Gemini Live support - PR #22152, PR #22153, PR #22161, PR #22165
Projects Management — New Projects UI with full CRUD, project-scoped virtual keys, and admin opt-in toggle — organize teams and keys by project - PR #22315, PR #22360, PR #22373, PR #22412
Guardrail ecosystem expansion — Noma v2, Lakera v2 post-call, Singapore regulatory policies (PDPA + MAS), employment discrimination blockers, code execution blocker, guardrail policy versioning, and production monitoring - PR #21400, PR #21783, PR #21948
OpenAI Codex 5.3 — day 0 — Full support for gpt-5.3-codex on OpenAI and Azure, plus gpt-audio-1.5 and gpt-realtime-1.5 model coverage - PR #22035
10+ performance optimizations — Streaming hot-path fixes, Redis pipeline batching, database task batching, ModelResponse init skip, and router cache improvements — lower latency and CPU on every request
/v1/messages → /responses routing — /v1/messages requests are now routed to the Responses API by default for OpenAI/Azure models

v1/messages routing change

This version starts routing /v1/messages requests to the /responses API by default. To opt out and continue using chat/completions, set LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true or litellm_settings.use_chat_completions_url_for_anthropic_messages: true in your config.

New Models / Updated Models

New Model Support (20 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5.3-codex`	272K	$1.75	$14.00	Reasoning, coding
Azure OpenAI	`azure/gpt-5.3-codex`	272K	$1.75	$14.00	Azure deployment
OpenAI	`gpt-audio-1.5`	128K	$2.50	$10.00	Audio model
Azure OpenAI	`azure/gpt-audio-1.5-2026-02-23`	128K	$2.50	$10.00	Audio model
OpenAI	`gpt-realtime-1.5`	32K	$4.00	$16.00	Realtime model
Azure OpenAI	`azure/gpt-realtime-1.5-2026-02-23`	32K	$4.00	$16.00	Realtime model
Groq	`groq/openai/gpt-oss-safeguard-20b`	131K	$0.075	$0.30	Guardrail inference
Google Vertex AI	`vertex_ai/gemini-3.1-flash-image-preview`	-	-	-	Image generation
Perplexity	`perplexity/perplexity/sonar`	-	-	-	Sonar search
Perplexity	`perplexity/openai/gpt-5.1`	-	-	-	Hosted routing
Perplexity	`perplexity/openai/gpt-5-mini`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-2.5-flash`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-2.5-pro`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-3-flash-preview`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-3-pro-preview`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-haiku-4-5`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-sonnet-4-5`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-opus-4-5`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-opus-4-6`	-	-	-	Hosted routing
Perplexity	`perplexity/xai/grok-4-1-fast-non-reasoning`	-	-	-	Hosted routing

Features

OpenAI
- Day 0 support for gpt-5.3-codex on OpenAI and Azure - PR #22035
- Add gpt-audio-1.5 model cost map - PR #22303
- Add gpt-realtime-1.5 model cost map - PR #22304
- Add audio as supported OpenAI param - PR #22092
- Add prompt_cache_key and prompt_cache_retention support - PR #20397
Azure OpenAI
- New Azure OpenAI models 2026-02-25 - PR #22114
Anthropic
- Add v1 Anthropic Responses API transformation - PR #22087
- Sanitize tool_use IDs in convert_to_anthropic_tool_invoke - PR #21964
- Fix model wildcard access issue - PR #21917
AWS Bedrock
- Encode model ARNs for OpenAI-compatible Bedrock imported models - PR #21701
- Support optional regional STS endpoint in role assumption - PR #21640
- Native structured outputs API support - PR #21222
Google Vertex AI
- Add gemini-3.1-flash-image-preview to model cost map - PR #22223
- Enable context-1m-2025-08-07 beta header for Vertex AI provider - PR #21867
OpenRouter
- Add OpenRouter native models to model cost map - PR #20520
- Add OpenRouter Opus 4.6 to model map - PR #20525
Mistral
- Adjust mistral-small-2503 input/output cost per token - PR #22097
Groq
- Add groq/openai/gpt-oss-safeguard-20b model pricing - PR #21951
AI/ML
- Update AIML model pricing - PR #22139
Ollama
- Thread api_base to get_model_info + graceful fallback - PR #21970
PublicAI
- Fix function calling for PublicAI Apertus models - PR #21582
xAI
- Add deprecation dates for grok-2-vision-1212 and grok-3-mini models - PR #20102
General
- Forward auth headers of provider - PR #22070
- Normalize camelCase thinking param keys to snake_case - PR #21762
- Allow dimensions param passthrough for non-text-embedding-3 OpenAI models - PR #22144

Bug Fixes

AWS Bedrock
- Fix converse handling for parallel_tool_calls - PR #22267
- Restore parallel_tool_calls mapping in map_openai_params - PR #22333
- Correct modelInput format for Converse API batch models - PR #21656
- Prevent double UUID in create_file S3 key - PR #21650
- Filter internal json_tool_call when mixed with real tools - PR #21107
- Pass timeout param to Bedrock rerank HTTP client - PR #22021
Anthropic
- Fix model cost map for anthropic fast and inference_geo - PR #21904
Image Generation
- Propagate extra_headers to upstream image generation - PR #22026
- Add ChatCompletionImageObject in OpenAIChatCompletionAssistantMessage - PR #22155
General
- Preserve forwarding of server-side called tools - PR #22260
- Fix free model handling from UI paths - PR #22258
- Fix None TypeError in mapping - PR #22080

LLM API Endpoints

Features

Realtime API
- Guardrails support for /v1/realtime WebSocket endpoint - PR #22152
- Vertex AI Gemini Live via unified /realtime endpoint - PR #22153
- Guardrails with pre_call/post_call mode on realtime WebSocket - PR #22161
- end_session_after_n_fails + Endpoint Settings wizard step - PR #22165
- Guardrail hook for voice transcription - PR #21976
- Fix guardrails not firing for Gemini/Vertex AI and provider_config realtime sessions - PR #22168
- Add logging, spend tracking support + tool tracing - PR #22105
Video Generation
- Add variant parameter to video content download - PR #21955
- Pass api_key from litellm_params to video remix handlers - PR #21965
- Apply custom video pricing from deployment model_info - PR #21923
- Fix passing of image and parameters in videos API - PR #22170
OCR
- Enable local file support for OCR - PR #22133
Websearch / Tool Calling
- Preserve thinking blocks in agentic loop follow-up messages - PR #21604
General
- Add configurable upper bound for chunk processing time - PR #22209
- Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Bugs

General
- Fix mypy attr-defined errors on realtime websocket calls - PR #22202

Management Endpoints / UI

Features

Projects
- Add Projects page with list and create flows - PR #22315
- Add Project Details page with edit modal - PR #22360
- Add project keys table and project dropdown on key create/edit - PR #22373
- Add delete project action to Projects table - PR #22412
- Add Projects Opt-In Toggle in Admin Settings - PR #22416
- Include created_at and updated_at in /project/list response - PR #22323
- Add tags in project - PR #22216
Virtual Keys + Access Groups
- Add bidirectional team/key sync for Access Group CRUD flows - PR #22253
- Add pagination and search to /key/aliases to prevent OOMs - PR #22137
- Add paginated key alias selector in UI - PR #22157
- Add project_id and access_group_id filters for key list endpoint - PR #22356
- Add KeyInfoHeader component - PR #22047
- Restrict Edit Settings to key owners - PR #21985
- Fix virtual key grace period from env/UI - PR #20321
Agents
- Assign virtual keys to agents - PR #22045
- Assign tools to agents - PR #22064
- Ensure internal users cannot create agents (RBAC enforcement) - PR #22329
Proxy Auth / SSO
- OIDC discovery URLs, roles array handling, and dot-notation error hints - PR #22336
- Add PROXY_ADMIN role to system user for key rotation - PR #21896
Usage / Spend Logs
- Add user filtering to usage page - PR #22059
- Allow using AI to understand usage patterns - PR #22042
- Use backend request_duration_ms and make Duration sortable in Logs - PR #22122
- Add request_duration_ms to SpendLogs - PR #22066
- Enrich failure spend logs with key/team metadata - PR #22049
- Show real tool names in logs for Anthropic-format tools - PR #22048
Models + Endpoints
- Show proxy URL in ModelHub - PR #21660
- Add /public/endpoints for provider endpoint support - PR #22248
UI Improvements
- Add custom favicon support - PR #21653
- Add Blog Dropdown in Navbar - PR #21859
- Add UI banner warning for detailed debug mode - PR #21527
- Make auth value optional for MCP Server create flow - PR #22119
- Tool policies: auto-discover tools + policy enforcement guardrail - PR #22041
Health Checks
- Add health check max tokens configuration - PR #22299
- Limit concurrent health checks with health_check_concurrency - PR #20584
- Fix health check model_id filtering - PR #21071

Bugs

Populate user_id and user_info for admin users in /user/info - PR #22239
Fix virtual keys pagination stale totals when filtering - PR #22222
Fix Spend Update Queue aggregation never triggers with default presets - PR #21963
Fix timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
Fix custom auth budget issue - PR #22164
Fix missing OAuth session state - PR #21992
Fix Transport Type for OpenAPI Spec on UI - PR #22005
Fix Claude Code plugin schema - PR #22271
Add missing migration for LiteLLM_ClaudeCodePluginTable - PR #22335
Only tag selected deployment in access group creation - PR #21655
State management fixes for CheckBatchCost - PR #21921
Remove duplicate antd import in ToolPolicies - PR #22107

AI Integrations

Logging

DataDog
- Add ability to trace metrics in DataDog - PR #22103
- Correlate LiteLLM call IDs with DataDog APM spans - PR #22219
- Fix TTS metric emission issues - PR #20632
Prometheus
- Add opt-in stream label on litellm_proxy_total_requests_metric - PR #22023
- Fix team +Inf budgets in Prometheus metrics - PR #22243
Langfuse
- Fix Langfuse OTEL trace issues - PR #21309
Arize Phoenix
- Fix nested traces coexistence with OTEL callback - PR #22169
Slack
- Add optional digest mode for Slack alert types - PR #21683
General
- Fix Gemini trace ID missing in logging - PR #22077
- Populate cache_read_input_tokens from prompt_tokens_details for OpenAI/Azure - PR #22090

Guardrails

Noma
- Noma guardrails v2 based on custom guardrails framework - PR #21400
LakeraAI
- Add Lakera v2 post-call hook with fixed PII masking - PR #21783
Presidio
- Fix Presidio streaming and false positives - PR #21949
- Fix Presidio streaming v3 reliability improvements - PR #22283
- Prevent Presidio crash on non-JSON responses - PR #22084
Built-in Guardrails
- Block code execution guardrail to prevent agents from executing code - PR #22154
- Employment discrimination topic blockers for 5 protected classes - PR #21962
- Claims agent guardrails (5 categories + policy template) - PR #22113
- New code execution evaluation dataset - PR #22065
- Tool policies: auto-discover tools + policy enforcement - PR #22041
Policy Templates
- Singapore guardrail policies (PDPA + MAS AI Risk Management) - PR #21948
- Prefix SG guardrail policy IDs with country code - PR #21974
- Guardrail policy versioning - PR #21862
Guardrail Monitoring
- Guardrail Monitor — measure guardrail reliability in production - PR #21944
Security
- Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095

Prompt Management

No major prompt management changes in this release.

Secret Managers

No major secret manager changes in this release.

Spend Tracking, Budgets and Rate Limiting

Priority PayGo cost tracking for Gemini/Vertex AI - PR #21909
Add request_duration_ms to SpendLogs for latency tracking per request - PR #22066
Add in_flight_requests metric to /health/backlog + Prometheus - PR #22319
Enrich failure spend logs with key/team metadata - PR #22049
Add spend tracking lifecycle logging for debugging spend flows - PR #22029
Fix budget timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
Fix Spend Update Queue aggregation never triggering with default presets - PR #21963
Avoid mutating caller-owned dicts in SpendUpdateQueue aggregation - PR #21742
Optimize old spendlog deletion cron job - PR #21930
Health check max tokens configuration - PR #22299

MCP Gateway

Pass MCP auth headers from request context to tool fetch for /v1/responses and /chat/completions - PR #22291
Default available_on_public_internet to true for MCP server behavior consistency - PR #22331
Clear error messages for IP filtering / no available tools - PR #22142
Strip stale mcp-session-id header to prevent 400 errors across proxy workers - PR #21417
Skip health check for MCP with passthrough token auth - PR #21982
Fix missing OAuth session state - PR #21992
Fix Transport Type for OpenAPI Spec on UI - PR #22005
Add e2e test for stateless StreamableHTTP behavior - PR #22033

Performance / Loadbalancing / Reliability improvements

Streaming & hot-path

Streaming latency improvements — 4 targeted hot-path fixes - PR #22346
Skip throwaway Usage() construction in ModelResponse.__init__ - PR #21611
Optimize is_model_o_series_model with startswith - PR #21690
Use cached _safe_get_request_headers instead of per-request construction - PR #21430
Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Database & Redis

Batch 11 create_task() calls into 1 in update_database() - PR #22028
Redis pipeline spend updates for batched writes - PR #22044
Recover from prisma-query-engine zombie process - PR #21899
Optimize old spendlog deletion cron job - PR #21930

Router & caching

Add cache invalidation for _cached_get_model_group_info - PR #20376
Remove cache eviction close that kills in-use httpx clients - PR #22247
Store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings - PR #22143
Fix ensure_arrival_time set before calculating queue time - PR #21918

Connection management

Only set enable_cleanup_closed on aiohttp when required - PR #21897
Prometheus child_exit cleanup for gunicorn workers - PR #22324
Prometheus multiprocess cleanup - PR #22221
Limit concurrent health checks with health_check_concurrency - PR #20584
Isolate get_config failures from model sync loop - PR #22224

Other

Semantic cache: support configurable vector dimensions - PR #21649
Honor MAX_STRING_LENGTH_PROMPT_IN_DB from config env vars - PR #22106
Enhance MidStreamFallbackError to preserve original status code and attributes - PR #22225
Network mock utility for testing - PR #21942
Add missing return type annotations to iterator protocol methods in streaming_handler - PR #21750

Security

Fix critical/high CVEs in OS-level libs and NPM transitive dependencies - PR #22008
Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095
Remove hardcoded base64 string flagged by secret scanner - PR #22125

Documentation Updates

Add OpenAI Agents SDK tutorial with LiteLLM Proxy - PR #21221
Add OpenClaw integration tutorial - PR #21605
Add Google GenAI SDK tutorial (JS & Python) - PR #21885
Add Gollem Go agent framework cookbook example - PR #21747
Update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway - PR #21130
Add store_model_in_db release docs - PR #21863
Add Credential Usage Tracking docs - PR #22112
Add proxy request tags docs - PR #22129
Add trailing slash to /mcp endpoint URLs - PR #20509
Add pre-PR checklist to UI contributing guide - PR #21886
Replace Azure OpenAI key with mock key in docs - PR #21997
Add performance & reliability section to v1.81.14 release notes - PR #21950
Update v1.81.12-stable release notes to point to stable.1 - PR #22036
Add security vulnerability scan report to v1.81.14 release notes - PR #22385

New Contributors

@janfrederickk made their first contribution in PR #21660
@hztBUAA made their first contribution in PR #21656
@LeeJuOh made their first contribution in PR #21754
@WhoisMonesh made their first contribution in PR #21750
@trevorprater made their first contribution in PR #21747
@edwiniac made their first contribution in PR #21870
@stakeswky made their first contribution in PR #21867
@ta-stripe made their first contribution in PR #21701
@ron-zhong made their first contribution in PR #21948
@Arindam200 made their first contribution in PR #21221
@Canvinus made their first contribution in PR #21964
@nicolopignatelli made their first contribution in PR #21951
@MarshHawk made their first contribution in PR #20584
@gavksingh made their first contribution in PR #22106
@roni-frantchi made their first contribution in PR #22090
@noahnistler made their first contribution in PR #22133
@dylan-duan-aai made their first contribution in PR #21130
@rasmi made their first contribution in PR #22322

Diff Summary

02/28/2026

New Models / Updated Models: 26
LLM API Endpoints: 14
Management Endpoints / UI: 38
AI Integrations: 25
Spend Tracking, Budgets and Rate Limiting: 10
MCP Gateway: 8
Performance / Loadbalancing / Reliability improvements: 22
Security: 3
Documentation Updates: 14

Full Changelog

v1.81.14.rc.1...v1.82.0

Deploy this version​

Key Highlights​

New Models / Updated Models​

New Model Support (20 new models)​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Prompt Management​

Secret Managers​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Security​

Documentation Updates​

New Contributors​

Diff Summary​

02/28/2026​

Full Changelog​

Deploy this version

Key Highlights

New Models / Updated Models

New Model Support (20 new models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Prompt Management

Secret Managers

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Security

Documentation Updates

New Contributors

Diff Summary

02/28/2026

Full Changelog