[BETA] High Availability Control Plane
Deploy a single LiteLLM UI that manages multiple independent LiteLLM proxy instances, each with its own database, Redis, and master key.
Why This Architecture?​
In the standard multi-region setup, all instances share a single database and master key. This works, but introduces a shared dependency. If the database goes down, every instance is affected.
The High Availability Control Plane takes a different approach:
| Shared Database (Standard) | High Availability Control Plane | |
|---|---|---|
| Database | Single shared DB for all instances | Each instance has its own DB |
| Redis | Shared Redis | Each instance has its own Redis |
| Master Key | Same key across all instances | Each instance has its own key |
| Failure isolation | DB outage affects all instances | Failure is isolated to one instance |
| User management | Centralized, one user table | Independent, each worker manages its own users |
| UI | One UI per admin instance | Single control plane UI manages all workers |
Benefits​
- True high availability: no shared infrastructure means no single point of failure
- Blast radius containment: a misconfiguration or outage on one worker doesn't affect others
- Regional isolation: workers can run in different regions with data residency requirements
- Simpler operations: each worker is a self-contained LiteLLM deployment
Architecture​
Lets admins switch between workers to manage them.
The control plane is a LiteLLM instance that serves the admin UI and knows about all the workers. It is not a router — it does not proxy or route any LLM requests. It exists purely so admins can switch between workers and manage them from a single UI.
Each worker is a fully independent LiteLLM proxy that handles LLM requests for its region or team. Workers have their own database, Redis, users, keys, teams, and budgets. No infrastructure is shared between workers.
Setup​
1. Control Plane Configuration​
The control plane needs a worker_registry that lists all worker instances.
model_list: []
general_settings:
master_key: sk-1234
database_url: os.environ/DATABASE_URL
worker_registry:
- worker_id: "worker-a"
name: "Worker A"
url: "http://localhost:4001"
- worker_id: "worker-b"
name: "Worker B"
url: "http://localhost:4002"
Start the control plane:
litellm --config cp_config.yaml --port 4000