v1.89.3 - Guardrails & Cache-Control Fixes
An earlier version of this note flagged a potential throughput regression. We investigated and could not confirm or reproduce any regression in the released version. The one report we received came from a deployment running custom code on top of what we shipped, and our testing points to those changes, not LiteLLM, as the likely cause.
Correctness and error rates were never affected. If you're on this version, there's nothing you need to do.
We're still monitoring incoming reports and will update this note if anything changes.
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.89.3
pip install litellm==1.89.3
v1.89.3 is a patch release on top of v1.89.2. It backports guardrail correctness fixes (a single pre-call hook for model-level guardrails, no DB re-init on every poll, 400 instead of 500 when AIM blocks a request) and caps Anthropic cache-control injection at the 4-block limit.
What's Changed​
- fix(integrations): cap Anthropic cache_control injection at 4 blocks - PR #30480
- fix(guardrails): run pre_call hook once for model-level guardrails - PR #30543
- fix(guardrails): stop re-initializing DB guardrails on every poll - PR #30542
- fix(guardrails): return 400 not 500 when AIM blocks a request - PR #30573
Full Changelog​
https://github.com/BerriAI/litellm/compare/v1.89.2...v1.89.3