Skip to main content

4 posts tagged with "engineering"

View All Tags

Migrating LiteLLM to Rust - Building the Fastest and Litest AI Gateway

Ishaan Jaffer
CTO, LiteLLM

Last Updated: June 2026

Over the past year, we have heard the same thing from our users and our community: they want the fastest, most lightweight AI gateway they can run. We have heard you. We are addressing it by moving LiteLLM to Rust, and committing to sub-1ms overhead with a sub-100MB memory binary you can deploy. By the end of this migration, you will get a pure Rust server that can serve 100% of your AI traffic, with every hot path operation, including auth and rate limiting, running in Rust.

Want to help us build it?

We are opening an early beta and want to work directly with teams who care about a fast, lightweight gateway. If that is you, sign up here and we will get you testing the Rust gateway in your own stack, with a direct line to our team.

The reason it matters: under real load, CPU and memory climb with concurrency, and pods get OOM-killed at the worst time. Today the LiteLLM Python proxy peaks around 359MB of memory under load, and that cost multiplies across every pod, region, and retry you run.

We are already seeing the payoff in benchmarks. The Rust gateway serves about 15x the throughput (453 to 6,782 requests per second) on about 11x less memory (359MB to 32MB), and cuts per-request overhead from about 7.5ms on the Python path to about 0.05ms, well under the 1ms we commit to.

What you get​

You deploy a single Rust binary. It uses about 65MB of memory, gateway overhead stays under 1ms, and nothing in your setup changes: same config.yaml, same database, same client API, same providers. You keep LiteLLM's coverage of 100+ LLM providers behind one OpenAI-compatible API, with /chat/completions, /messages, /responses, and every other LLM endpoint LiteLLM supports today, now as the fastest and most lightweight LLM gateway you can self-host.

This is not a v2 and not a rewrite. There is no new major version to migrate to and nothing for you to change. The runtime under the hot path gets faster and lighter while your config stays exactly where it is.

We ship this the careful way. Each route moves to Rust only after it passes our full parity and end-to-end test suite, and it runs in production before the next route starts. Stability is the priority, and we target zero regressions on every release.

How we built a background agent to cover 30% of our backlog

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM
LiteLLM Agent Platform: agent.litellm.ai
info

The platform we built is open source. Check out litellm-agent-platform. The swappable harness layer is lite-harness.

Building the same thing inside your company?

Our goal was to 10x the productivity of our company with agents.

Three weeks ago we began building an agent that could own 30% of our engineering tickets. Here's what we've learnt so far.

Making the AI Gateway Resilient to Redis Failures

Ishaan Jaffer
CTO, LiteLLM

Last Updated: April 2026

Enterprise AI Gateway deployments put Redis in the hot path for nearly every request: rate limiting, cache lookups, spend tracking. When Redis is healthy, the latency contribution is single-digit milliseconds — invisible to end users. When it degrades, a production AI Gateway needs to stay up regardless.

Running LiteLLM at scale across 100+ pods means designing for failure modes before they appear. The easy case is Redis going fully down: fail fast, fall through to the database, continue serving requests. The hard case — the one that takes down gateways — is a slow Redis: still accepting connections, still responding, but timing out after 20-30 seconds per operation.

Announcing CI/CD v2 for LiteLLM

Krrish Dholakia
CEO, LiteLLM

The CI/CD v2 is now live for LiteLLM.


Building on the roadmap from our security incident, CI/CD v2 introduces isolated environments, stronger security gates, and safer release separation for LiteLLM.

What changed​

  • Security scans and unit tests run in isolated environments.
  • Validation and release are separated into different repositories, making it harder for an attacker to reach release credentials.
  • Trusted Publishing for PyPI releases - this means no long-lived credentials are used to publish releases.
  • Immutable Docker release tags - this means no tampering of Docker release tags after they are published Learn more. Note: work for GHCR docker releases is planned as well.
  • Docker image signing with Cosign - all release images are signed so users can independently verify they came from us.

Verify Docker image signatures​

Starting from v1.83.0-nightly, all LiteLLM Docker images published to GHCR are signed with cosign. Every release is signed with the same key introduced in commit 0112e53.

Verify using the pinned commit hash (recommended):

A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key:

cosign verify \
--key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \
ghcr.io/berriai/litellm:<release-tag>

Verify using a release tag (convenience):

Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules:

cosign verify \
--key https://raw.githubusercontent.com/BerriAI/litellm/<release-tag>/cosign.pub \
ghcr.io/berriai/litellm:<release-tag>

Replace <release-tag> with the version you are deploying (e.g. v1.83.0-stable).

Expected output:

The following checks were performed on each of these signatures:
- The cosign claims were validated
- The signatures were verified against the specified public key

What's next​

Moving forward, we plan on:

  • Adopting OpenSSF (this is a set of security criteria that projects should meet to demonstrate a strong security posture - Learn more)

    • We've added Scorecard and Allstar to our Github
  • Adding SLSA Build Provenance to our CI/CD pipeline - this means we allow users to independently verify that a release came from us and prevent silent modifications of releases after they are published.

We hope that this will mean you can be confident that the releases you are using are safe and from us.

The principle​

The new CI/CD pipeline reflects the principles, outlined below, and is designed to be more secure and reliable:

  • Limit what each package can access
  • Reduce the number of sensitive environment variables
  • Avoid compromised packages
  • Prevent release tampering

How to help:​

Help us plan April's stability sprint - https://github.com/BerriAI/litellm/issues/24825