Skip to main content

[Preview] v1.78.5-stable - Native OCR Support

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.5.rc.1

Key Highlightsโ€‹

  • Native OCR Endpoints - Native /v1/ocr endpoint support with cost tracking for Mistral OCR and Azure AI OCR
  • Global Vendor Discounts - Specify global vendor discount percentages for accurate cost tracking and reporting
  • Team Spending Reports - Team admins can now export detailed spending reports for their teams
  • Claude Haiku 4.5 - Day 0 support for Claude Haiku 4.5 across Bedrock, Vertex AI, and OpenRouter with 200K context window
  • GPT-5-Codex - Support for GPT-5-Codex via Responses API on OpenAI and Azure
  • Performance Improvements - Major router optimizations: O(1) model lookups, 10-100x faster shallow copy, 30-40% faster timing calls, and O(n) to O(1) hash generation

New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Anthropicclaude-haiku-4-5200K$1.00$5.00Chat, reasoning, vision, function calling, prompt caching, computer use
Anthropicclaude-haiku-4-5-20251001200K$1.00$5.00Chat, reasoning, vision, function calling, prompt caching, computer use
Bedrockanthropic.claude-haiku-4-5-20251001-v1:0200K$1.00$5.00Chat, reasoning, vision, function calling, prompt caching
Bedrockglobal.anthropic.claude-haiku-4-5-20251001-v1:0200K$1.00$5.00Chat, reasoning, vision, function calling, prompt caching
Bedrockjp.anthropic.claude-haiku-4-5-20251001-v1:0200K$1.10$5.50Chat, reasoning, vision, function calling, prompt caching (JP Cross-Region)
Bedrockus.anthropic.claude-haiku-4-5-20251001-v1:0200K$1.10$5.50Chat, reasoning, vision, function calling, prompt caching (US region)
Bedrockeu.anthropic.claude-haiku-4-5-20251001-v1:0200K$1.10$5.50Chat, reasoning, vision, function calling, prompt caching (EU region)
Bedrockapac.anthropic.claude-haiku-4-5-20251001-v1:0200K$1.10$5.50Chat, reasoning, vision, function calling, prompt caching (APAC region)
Bedrockau.anthropic.claude-haiku-4-5-20251001-v1:0200K$1.10$5.50Chat, reasoning, vision, function calling, prompt caching (AU region)
Vertex AIvertex_ai/claude-haiku-4-5@20251001200K$1.00$5.00Chat, reasoning, vision, function calling, prompt caching
OpenAIgpt-5272K$1.25$10.00Chat, responses API, reasoning, vision, function calling, prompt caching
OpenAIgpt-5-codex272K$1.25$10.00Responses API mode
Azureazure/gpt-5-codex272K$1.25$10.00Responses API mode
Geminigemini-2.5-flash-image32K$0.30$2.50Image generation (GA - Nano Banana) - $0.039/image
ZhipuAIglm-4.6---Chat completions

Featuresโ€‹

  • OpenAI

    • GPT-5 return reasoning content via /chat/completions + GPT-5-Codex working on Claude Code - PR #15441
  • Anthropic

    • Reduce claude-4-sonnet max_output_tokens to 64k - PR #15409
    • Added claude-haiku-4.5 - PR #15579
    • Add support for thinking blocks and redacted thinking blocks in Anthropic v1/messages API - PR #15501
  • Bedrock

    • Add anthropic.claude-haiku-4-5-20251001-v1:0 on Bedrock, VertexAI - PR #15581
    • Add Claude Haiku 4.5 support for Bedrock global and US regions - PR #15650
    • Add Claude Haiku 4.5 support for Bedrock Other regions - PR #15653
    • Add JP Cross-Region Inference jp.anthropic.claude-haiku-4-5-20251001 - PR #15598
    • Fix: bedrock-pricing-geo-inregion-cross-region / add Global Cross-Region Inference - PR #15685
    • Fix: Support us-gov prefix for AWS GovCloud Bedrock models - PR #15626
    • Fix GPT-OSS in Bedrock now supports streaming. Revert fake streaming - PR #15668
  • Gemini

    • Feat(pricing): Add Gemini 2.5 Flash Image (Nano Banana) in GA - PR #15557
    • Fix: Gemini 2.5 Flash Image should not have supports_web_search=true - PR #15642
    • Remove penalty params as supported params for gemini preview model - PR #15503
  • Ollama

    • Fix(ollama/chat): correctly map reasoning_effort to think in requests - PR #15465
  • OpenRouter

    • Add anthropic/claude-sonnet-4.5 to OpenRouter cost map - PR #15472
    • Prompt caching for anthropic models with OpenRouter - PR #15535
    • Get completion cost directly from OpenRouter - PR #15448
    • Fix OpenRouter Claude Opus 4 model naming - PR #15495
  • CometAPI

    • Fix(cometapi): improve CometAPI provider support (embeddings, image generation, docs) - PR #15591
  • Lemonade

    • Adding new models to the lemonade provider - PR #15554
  • Watson X

    • Fix (pricing): Fix pricing for watsonx model family for various models - PR #15670
  • Vercel AI Gateway

    • Add glm-4.6 model to pricing configuration - PR #15679
  • Vertex AI

    • Add Vertex AI Discovery Engine Rerank Support - PR #15532

Bug Fixesโ€‹


LLM API Endpointsโ€‹

Featuresโ€‹

  • Responses API

    • Responses API - enable calling anthropic/gemini models in Responses API streaming in openai ruby sdk + DB - sanity check pending migrations before startup - PR #15432
    • Add support for responses mode in health check - PR #15658
  • OCR API

    • Feat: Add native litellm.ocr() functions - PR #15567
    • Feat: Add /ocr route on LiteLLM AI Gateway - Adds support for native Mistral OCR calling - PR #15571
    • Feat: Add Azure AI Mistral OCR Integration - PR #15572
    • Feat: Native /ocr endpoint support - PR #15573
    • Feat: Add Cost Tracking for /ocr endpoints - PR #15678
  • /generateContent

    • Fix: GEMINI - CLI - add google_routes to llm_api_routes - PR #15500
    • Fix Pydantic validation error for citationMetadata.citationSources in Google GenAI responses - PR #15592
  • Images API

    • Fix: Dall-e-2 for Image Edits API - PR #15604
  • Bedrock Passthrough

    • Feat: Allow calling /invoke, /converse routes through AI Gateway + models on config.yaml - PR #15618

Bugsโ€‹

  • General
    • Fix: Convert object to a correct type - PR #15634
    • Bug Fix: Tags as metadata dicts were raising exceptions - PR #15625
    • Add type hint to function_to_dict and fix typo - PR #15580

Management Endpoints / UIโ€‹

Featuresโ€‹

  • Virtual Keys

    • Docs: Key Rotations - PR #15455
    • Fix: UI - Key Max Budget Removal Error Fix - PR #15672
    • litellm_Key Settings Max Budget Removal Error Fix - PR #15669
  • Teams

    • Feat: Allow Team Admins to export a report of the team spending - PR #15542
  • Passthrough

    • Feat: Passthrough - allow admin to give access to specific passthrough endpoints - PR #15401
  • SCIM v2

    • Feat(scim_v2.py): if group.id doesn't exist, use external id + Passthrough - ensure updates and deletions persist across instances - PR #15276
  • SSO

    • Feat: UI SSO - Add PKCE for OKTA SSO - PR #15608
    • Fix: Separate OAuth M2M authentication from UI SSO + Handle Introspection endpoint for Oauth2 - PR #15667
    • Fix/entraid app roles jwt claim clean - PR #15583

Logging / Guardrail / Prompt Management Integrationsโ€‹

Guardrailsโ€‹

  • General

    • Fix apply_guardrail endpoint returning raw string instead of ApplyGuardrailResponse - PR #15436
    • Fix: Ensure guardrail memory sync after database updates - PR #15633
    • Feat: add guardrail for image generation - PR #15619
    • Feat: Add Guardrails for /v1/messages and /v1/responses API - PR #15686
  • Pillar Security

    • Feature: update pillar security integration to support no persistence mode in litellm proxy - PR #15599

Prompt Managementโ€‹

  • General
    • Small fix code snippet custom_prompt_management.md - PR #15544

Spend Tracking, Budgets and Rate Limitingโ€‹

  • Cost Tracking

    • Feat: Cost Tracking - specify a global vendor discount for costs - PR #15546
    • Feat: UI - Allow setting Provider Discounts on UI - PR #15550
  • Budgets


Performance / Loadbalancing / Reliability improvementsโ€‹

  • Router Optimizations

    • Perf(router): use shallow copy instead of deepcopy for model aliases - 10-100x faster than deepcopy on nested dict structures - PR #15576
    • Perf(router): optimize string concatenation in hash generation - Improves time complexity from O(nยฒ) to O(n) - PR #15575
    • Perf(router): optimize model lookups with O(1) data structures - Replace O(n) scans with index map lookups - PR #15578
    • Perf(router): optimize model lookups with O(1) index maps - Use model_id_to_deployment_index_map and model_name_to_deployment_indices for instant lookups - PR #15574
    • Perf(router): optimize timing functions in completion hot path - Use time.perf_counter() for duration measurements and time.monotonic() for timeout calculations, providing 30-40% faster timing calls - PR #15617
  • SSL/TLS Performance

    • Feat(ssl): add configurable ECDH curve for TLS performance - Configure via ssl_ecdh_curve setting to disable PQC on OpenSSL 3.x for better performance - PR #15617
  • Token Counter

    • Fix(token-counter): extract model_info from deployment for custom_tokenizer - PR #15680
  • Performance Metrics

  • CI/CD

    • Fix: CI/CD - Missing env key & Linter type error - PR #15606

Documentation Updatesโ€‹

  • Provider Documentation

  • General


New Contributorsโ€‹

  • @jlan-nl made their first contribution in PR #15374
  • @ImadSaddik made their first contribution in PR #15267
  • @huangyafei made their first contribution in PR #15472
  • @mubashir1osmani made their first contribution in PR #15468
  • @kowyo made their first contribution in PR #15465
  • @dhruvyad made their first contribution in PR #15448
  • @davizucon made their first contribution in PR #15544
  • @FelipeRodriguesGare made their first contribution in PR #15540
  • @ndrsfel made their first contribution in PR #15557
  • @shinharaguchi made their first contribution in PR #15598
  • @TensorNull made their first contribution in PR #15591
  • @TeddyAmkie made their first contribution in PR #15583
  • @aniketmaurya made their first contribution in PR #15580
  • @eddierichter-amd made their first contribution in PR #15554
  • @konekohana made their first contribution in PR #15535
  • @Classic298 made their first contribution in PR #15495
  • @afogel made their first contribution in PR #15599
  • @orolega made their first contribution in PR #15633
  • @LucasSugi made their first contribution in PR #15634
  • @uc4w6c made their first contribution in PR #15619
  • @Sameerlite made their first contribution in PR #15658
  • @yuneng-jiang made their first contribution in PR #15672
  • @Nikro made their first contribution in PR #15680

Full Changelogโ€‹

View complete changelog on GitHub