v1.89.3 - Guardrails & Cache-Control Fixes

Update: no performance regression found

An earlier version of this note flagged a potential throughput regression. We investigated and could not confirm or reproduce any regression in the released version. The one report we received came from a deployment running custom code on top of what we shipped, and our testing points to those changes, not LiteLLM, as the likely cause.

Correctness and error rates were never affected. If you're on this version, there's nothing you need to do.

We're still monitoring incoming reports and will update this note if anything changes.

Deploy this version

Docker
Pip

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.89.3

pip install litellm==1.89.3

v1.89.3 is a patch release on top of v1.89.2. It backports guardrail correctness fixes (a single pre-call hook for model-level guardrails, no DB re-init on every poll, 400 instead of 500 when AIM blocks a request) and caps Anthropic cache-control injection at the 4-block limit.

What's Changed

fix(integrations): cap Anthropic cache_control injection at 4 blocks - PR #30480
fix(guardrails): run pre_call hook once for model-level guardrails - PR #30543
fix(guardrails): stop re-initializing DB guardrails on every poll - PR #30542
fix(guardrails): return 400 not 500 when AIM blocks a request - PR #30573

Full Changelog

https://github.com/BerriAI/litellm/compare/v1.89.2...v1.89.3

Deploy this version​

What's Changed​

Full Changelog​

Deploy this version

What's Changed

Full Changelog