Skip to main content

[Preview] v1.78.0-stable - MCP Gateway: Control Tool Access by Team, Key

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.0.rc.2

Key Highlightsโ€‹

  • MCP Gateway - Control Tool Access by Team, Key - Control MCP tool access by team/key.
  • Performance Improvements - 70% Lower p99 Latency
  • GPT-5 Pro & GPT-Image-1-Mini - Day 0 support for OpenAI's GPT-5 Pro (400K context) and gpt-image-1-mini image generation
  • EnkryptAI Guardrails - New guardrail integration for content moderation
  • Tag-Based Budgets - Support for setting budgets based on request tags

MCP Gateway - Control Tool Access by Team, Keyโ€‹


Proxy admins can now control MCP tool access by team or key. This makes it easy to grant different teams selective access to tools from the same MCP server.

For example, you can now give your Engineering team access to list_repositories, create_issue, and search_code tools, while Sales only gets search_code and close_issue tools.

This makes it easier for Proxy Admins to govern MCP Tool Access.

Get Started


Performance - 70% Lower p99 Latencyโ€‹


This release cuts p99 latency by 70% on LiteLLM AI Gateway, making it even better for low-latency use cases.

These gains come from two key enhancements:

Reliable Sessions

Added support for shared sessions with aiohttp. The shared_session parameter is now consistently used across all calls, enabling connection pooling.

Faster Routing

A new model_name_to_deployment_indices hash map replaces O(n) list scans in _get_all_deployments() with O(1) hash lookups, boosting routing performance and scalability.

As a result, performance improved across all latency percentiles:

  • Median latency: 110 ms โ†’ 100 ms (โˆ’9.1%)
  • p95 latency: 440 ms โ†’ 150 ms (โˆ’65.9%)
  • p99 latency: 810 ms โ†’ 240 ms (โˆ’70.4%)
  • Average latency: 310 ms โ†’ 111.73 ms (โˆ’64.0%)

Test Setupโ€‹

Locust

  • Concurrent users:ย 1,000
  • Ramp-up:ย 500

System Specs

  • Database was used
  • CPU:ย 4 vCPUs
  • Memory:ย 8 GB RAM
  • LiteLLM Workers:ย 4
  • Instances: 4

Configuration (config.yaml)

View the complete configuration:ย gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script:ย gist.github.com/AlexsanderHamir/no_cache_hits.py


New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenAIgpt-5-pro400K$15.00$120.00Responses API, reasoning, vision, function calling, prompt caching, web search
OpenAIgpt-5-pro-2025-10-06400K$15.00$120.00Responses API, reasoning, vision, function calling, prompt caching, web search
OpenAIgpt-image-1-mini-$2.00/img-Image generation and editing
OpenAIgpt-realtime-mini128K$0.60$2.40Realtime audio, function calling
Azure AIazure_ai/Phi-4-mini-reasoning131K$0.08$0.32Function calling
Azure AIazure_ai/Phi-4-reasoning32K$0.125$0.50Function calling, reasoning
Azure AIazure_ai/MAI-DS-R1128K$1.35$5.40Reasoning, function calling
Bedrockau.anthropic.claude-sonnet-4-5-20250929-v1:0200K$3.30$16.50Chat, reasoning, vision, function calling, prompt caching
Bedrockglobal.anthropic.claude-sonnet-4-5-20250929-v1:0200K$3.00$15.00Chat, reasoning, vision, function calling, prompt caching
Bedrockglobal.anthropic.claude-sonnet-4-20250514-v1:01M$3.00$15.00Chat, reasoning, vision, function calling, prompt caching
Bedrockcohere.embed-v4:0128K$0.12-Embeddings, image input support
OCIoci/cohere.command-latest128K$1.56$1.56Function calling
OCIoci/cohere.command-a-03-2025256K$1.56$1.56Function calling
OCIoci/cohere.command-plus-latest128K$1.56$1.56Function calling
Together AItogether_ai/moonshotai/Kimi-K2-Instruct-0905262K$1.00$3.00Function calling
Together AItogether_ai/Qwen/Qwen3-Next-80B-A3B-Instruct262K$0.15$1.50Function calling
Together AItogether_ai/Qwen/Qwen3-Next-80B-A3B-Thinking262K$0.15$1.50Function calling
Vertex AIMedGemma modelsVariesVariesVariesMedical-focused Gemma models on custom endpoints
Watson X27 new foundation modelsVariesVariesVariesGranite, Llama, Mistral families

Featuresโ€‹

  • OpenAI

    • Add GPT-5 Pro model configuration and documentation - PR #15258
    • Add stop parameter to non-supported params for GPT-5 - PR #15244
    • Day 0 Support, Add gpt-image-1-mini - PR #15259
    • Add gpt-realtime-mini support - PR #15283
    • Add gpt-5-pro-2025-10-06 to model costs - PR #15344
    • Minimal fix: gpt5 models should not go on cooldown when called with temperature!=1 - PR #15330
  • Snowflake Cortex

    • Add function calling support for Snowflake Cortex REST API - PR #15221
  • Gemini

    • Fix header forwarding for Gemini/Vertex AI providers in proxy mode - PR #15231
  • Azure

    • Removed stop param from unsupported azure models - PR #15229
    • Fix(azure/responses): remove invalid status param from azure call - PR #15253
    • Add new Azure AI models with pricing details - PR #15387
    • AzureAD Default credentials - select credential type based on environment - PR #14470
  • Bedrock

    • Add Global Cross-Region Inference - PR #15210
    • Add Cohere Embed v4 support for AWS Bedrock - PR #15298
    • Fix(bedrock): include cacheWriteInputTokens in prompt_tokens calculation - PR #15292
    • Add Bedrock AU Cross-Region Inference for Claude Sonnet 4.5 - PR #15402
    • Converse โ†’ /v1/messages streaming doesn't handle parallel tool calls with Claude models - PR #15315
  • Vertex AI

    • Implement Context Caching for Vertex AI provider - PR #15226
    • Support for Vertex AI Gemma Models on Custom Endpoints - PR #15397
    • VertexAI - gemma model family support (custom endpoints) - PR #15419
    • VertexAI Gemma model family streaming support + Added MedGemma - PR #15427
  • OCI

    • Add OCI Cohere support with tool calling and streaming capabilities - PR #15365
  • Watson X

    • Add Watson X foundation model definitions to model_prices_and_context_window.json - PR #15219
    • Watsonx - Apply correct prompt templates for openai/gpt-oss model family - PR #15341
  • OpenRouter

    • Fix - (openrouter): move cache_control to content blocks for claude/gemini - PR #15345
    • Fix - OpenRouter cache_control to only apply to last content block - PR #15395
  • Together AI

Bug Fixesโ€‹

  • General
    • Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - PR #15116
    • Fix reasoning response ID - PR #15265
    • Fix issue with parsing assistant messages - PR #15320
    • Fix litellm_param based costing - PR #15336
    • Fix lint errors - PR #15406

LLM API Endpointsโ€‹

Featuresโ€‹

Bugsโ€‹

  • General
    • Fix x-litellm-cache-key header not being returned on cache hit - PR #15348

Management Endpoints / UIโ€‹

Featuresโ€‹

  • Proxy CLI Auth

    • Proxy CLI - dont store existing key in the URL, store it in the state param - PR #15290
  • Models + Endpoints

    • Make PATCH /model/{model_id}/update handle team_id consistently with POST /model/new - PR #15297
    • Feature: adds Infinity as a provider in the UI - PR #15285
    • Fix: model + endpoints page crash when config file contains router_settings.model_group_alias - PR #15308
    • Models & Endpoints Initial Refactor - PR #15435
    • Litellm UI API Reference page updates - PR #15438
  • Teams

    • Teams page: new column "Your Role" on the teams table - PR #15384
    • LiteLLM Dashboard Teams UI refactor - PR #15418
  • UI Infrastructure

    • Added prettier to autoformat frontend - PR #15215
    • Adds turbopack to the npm run dev command in UI to build faster during development - PR #15250
    • (perf) fix: Replaces bloated key list calls with lean key aliases endpoint - PR #15252
    • Potentially fixes a UI spasm issue with an expired cookie - PR #15309
    • LiteLLM UI Refactor Infrastructure - PR #15236
    • Enforces removal of unused imports from UI - PR #15416
    • Fix: usage page >> Model Activity >> spend per day graph: y-axis clipping on large spend values - PR #15389
    • Updates guardrail provider logos - PR #15421
  • Admin Settings

    • Fix: Router settings do not update despite success message - PR #15249
    • Fix: Prevents DB from accidentally overriding config file values if they are empty in DB - PR #15340
  • SSO

    • SSO - support EntraID app roles - PR #15351

Logging / Guardrail / Prompt Management Integrationsโ€‹

Featuresโ€‹

Guardrailsโ€‹


Spend Tracking, Budgets and Rate Limitingโ€‹

  • Tag Management

    • Tag Management - Add support for setting tag based budgets - PR #15433
  • Dynamic Rate Limiter v3

    • QA/Fixes - Dynamic Rate Limiter v3 - final QA - PR #15311
    • Fix dynamic Rate limiter v3 - inserting litellm_model_saturation - PR #15394
  • Shared Health Check

    • Implement Shared Health Check State Across Pods - PR #15380

MCP Gatewayโ€‹

  • Tool Control

    • MCP Gateway - UI - Select allowed tools for Key, Teams - PR #15241
    • MCP Gateway - Backend - Allow storing allowed tools by team/key - PR #15243
    • MCP Gateway - Fine-grained Database Object Storage Control - PR #15255
    • MCP Gateway - Litellm mcp fixes team control - PR #15304
    • MCP Gateway - QA/Fixes - Ensure Team/Key level enforcement works for MCPs - PR #15305
    • Feature: Include server_name in /v1/mcp/server/health endpoint response - PR #15431
  • OpenAPI Integration

    • MCP - support converting OpenAPI specs to MCP servers - PR #15343
    • MCP - specify allowed params per tool - PR #15346
  • Configuration

    • MCP - support setting CA_BUNDLE_PATH - PR #15253
    • Fix: Ensure MCP client stays open during tool call - PR #15391
    • Remove hardcoded "public" schema in migration.sql - PR #15363

Performance / Loadbalancing / Reliability improvementsโ€‹

  • Router Optimizations

    • Fix - Router: add model_name index for O(1) deployment lookups - PR #15113
    • Refactor Utils: extract inner function from client - PR #15234
    • Fix Networking: remove limitations - PR #15302
  • Session Management

    • Fix - Sessions not being shared - PR #15388
    • Fix: remove panic from hot path - PR #15396
    • Fix - shared session parsing and usage issue - PR #15440
    • Fix: handle closed aiohttp sessions - PR #15442
    • Fix: prevent session leaks when recreating aiohttp sessions - PR #15443
  • SSL/TLS Performance

    • Perf: optimize SSL/TLS handshake performance with prioritized cipher - PR #15398
  • Dependencies

    • Upgrades tenacity version to 8.5.0 - PR #15303
  • Data Masking

    • Fix - SensitiveDataMasker converts lists to string - PR #15420

General AI Gateway Improvementsโ€‹

Securityโ€‹

  • General
    • Fix: redact AWS credentials when redact_user_api_key_info enabled - PR #15321

Documentation Updatesโ€‹

  • Provider Documentation

  • Deployment

    • Deletion of docker-compose buggy comment that cause config.yaml based startup fail - PR #15425

New Contributorsโ€‹

  • @Gal-bloch made their first contribution in PR #15219
  • @lcfyi made their first contribution in PR #15315
  • @ashengstd made their first contribution in PR #15362
  • @vkolehmainen made their first contribution in PR #15363
  • @jlan-nl made their first contribution in PR #15330
  • @BCook98 made their first contribution in PR #15402
  • @PabloGmz96 made their first contribution in PR #15425

Full Changelogโ€‹