Skip to main content

5 posts tagged with "ai-gateway"

View All Tags

LiteLLM Labs: Announcing Lite-Harness SDK — Unified API for Claude Code, Codex, and Pi AI

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Harnesses are the next frontier of vendor lock-in. LiteLLM was built to swap across model providers easily. However, as the models get saturated, the next area for competition becomes the harnesses and managed agents. To make it easy to go across vendors at the harness layer, we're launching the Lite-Harness SDK. This is a simple TypeScript+Python SDK which allows developers to change harnesses, like they change models.

It exposes harnesses in a unified Claude Agents SDK spec. This means that if you wrote your app with the Claude Agents SDK, and want to try another harness (Pi AI, Hermes, Codex, OpenCode), you can do so without rewriting your code.

Today, it supports 3 harnesses - Claude Code, Codex, and Pi AI. Please file an issue here, if you want us to add another harness.

Here's how it works:

TypeScript Example

import { query } from "@lite-harness/sdk";

const prompt = "Fix the failing test";

// Claude Code harness
for await (const message of query({
prompt,
options: { harness: "claude-code", model: "claude-opus-4-8" },
})) {
console.log(message);
}

// Codex harness
for await (const message of query({
prompt,
options: { harness: "codex", model: "gpt-5.5" },
})) {
console.log(message);
}

Python Example

from lite_harness import query, AgentOptions

prompt = "Fix the failing test"

# Claude Code harness
async for message in query(
prompt=prompt,
options=AgentOptions(harness="claude-code", model="claude-opus-4-8"),
):
print(message)

# Codex harness
async for message in query(
prompt=prompt,
options=AgentOptions(harness="codex", model="gpt-5.5"),
):
print(message)

LiteLLM AI Gateway​

Lite-Harness supports proxy'ing harnesses via LiteLLM AI Gateway. This enables easy model swapping, cost controls and logging.

Point Lite-Harness at your gateway by setting two environment variables:

export LITELLM_API_BASE=https://litellm.your-company.com/v1
export LITELLM_API_KEY=sk-litellm-...

Then call as usual — every underlying model request routes through the gateway:

from lite_harness import query, AgentOptions

prompt = "Fix the failing test"

# Claude Code harness
async for message in query(
prompt=prompt,
options=AgentOptions(harness="claude-code", model="claude-opus-4-8"),
):
print(message)

# Codex harness
async for message in query(
prompt=prompt,
options=AgentOptions(harness="codex", model="gpt-5.5"),
):
print(message)

Frequently Asked Questions​

Do I have to use the LiteLLM AI Gateway?​

No. lite-harness works standalone — point it at provider APIs with native keys. AI Gateway integration is opt-in for teams that want central key management, budgets, fallbacks, and a single audit log across every model call.

Does swapping harnesses change agent behavior?​

Yes — that's the point. Each harness keeps its native loop, tool-calling semantics, and prompt format. lite-harness unifies how you invoke them, not how they run internally. Run the same prompt across all three to see which combo lands the task best.

Is this ready for production?​

lite-harness is an early, experimental project. This is in public beta. Please join our discord, to help design it to your preference.

Is this available in LiteLLM OSS?​

Yes. lite-harness is MIT-licensed at github.com/LiteLLM-Labs/lite-harness. LiteLLM Enterprise adds SSO/SCIM, air-gapped deployment, 24/7 SLA, and advanced guardrails on top of the AI Gateway it pairs with.

Launching LiteLLM-Rust: A Minimal Rust AI Gateway for Coding Agents

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Last Updated: June 2026

Today we're launching LiteLLM-Rust — a minimal Rust-based AI Gateway for coding agents.

Repo: github.com/LiteLLM-Labs/litellm-rust

It does three things:

  1. LiteLLM-compatible — your existing config.yaml and database work out of the box.
  2. Fast — targeting <1ms overhead latency on Claude Code calls.
  3. Built for autonomous agents — sandboxing via E2B and Daytona today, with durable sessions, memory, artifacts, and vault on the roadmap.

How we built a background agent to cover 30% of our backlog

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM
LiteLLM Agent Platform: agent.litellm.ai
info

The platform we built is open source. Check out litellm-agent-platform. The swappable harness layer is lite-harness.

Building the same thing inside your company?

Our goal was to 10x the productivity of our company with agents.

Three weeks ago we began building an agent that could own 30% of our engineering tickets. Here's what we've learnt so far.

Announcing Componentized Deployments

Yassin Kortam
Senior SWE @ LiteLLM

Last Updated: May 2026

The LiteLLM proxy container does 2 very different things. It's an LLM data plane, /chat/completions, /v1/messages, embeddings, passthroughs, where latency is measured in single-digit milliseconds of overhead and traffic is high-volume and bursty. It's also a management control plane — keys, teams, SSO, audit logs, and the spend/usage analytics that power the dashboard, where a single request can scan millions of rows.

Run both on the same event loop, and the slowest thing the control plane does sets the reliability floor for the fastest thing the data plane does. This post is about how we've improved LiteLLM's reliability at scale by offering a componentized deployment model.

Making the AI Gateway Resilient to Redis Failures

Ishaan Jaffer
CTO, LiteLLM

Last Updated: April 2026

Enterprise AI Gateway deployments put Redis in the hot path for nearly every request: rate limiting, cache lookups, spend tracking. When Redis is healthy, the latency contribution is single-digit milliseconds — invisible to end users. When it degrades, a production AI Gateway needs to stay up regardless.

Running LiteLLM at scale across 100+ pods means designing for failure modes before they appear. The easy case is Redis going fully down: fail fast, fall through to the database, continue serving requests. The hard case — the one that takes down gateways — is a slow Redis: still accepting connections, still responding, but timing out after 20-30 seconds per operation.