Skip to main content

Memory Management

Store user preferences and feedback so your LLM remembers them across sessions. Scoped per user and team, with built-in access control.

Requires: LiteLLM v1.83.10+ with PostgreSQL connected. No config changes needed.

Create​

curl -X POST "http://localhost:4000/v1/memory" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"key": "user:preferences",
"value": "Prefers concise responses. Timezone: PST.",
"metadata": {"version": 1}
}'

Read​

curl "http://localhost:4000/v1/memory/user:preferences" \
-H "Authorization: Bearer sk-1234"

Update​

curl -X PUT "http://localhost:4000/v1/memory/user:preferences" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{"value": "Prefers concise responses. Timezone: EST."}'

List​

# All entries
curl "http://localhost:4000/v1/memory" \
-H "Authorization: Bearer sk-1234"

# By prefix
curl "http://localhost:4000/v1/memory?key_prefix=user:" \
-H "Authorization: Bearer sk-1234"

Delete​

curl -X DELETE "http://localhost:4000/v1/memory/user:preferences" \
-H "Authorization: Bearer sk-1234"

Access Control​

Scoping is automatic based on the API key.

RoleReadsWrites
UserOwn + team entriesOwn entries only
Team adminOwn + team entriesOwn + team entries
Proxy adminAllAll

Key Naming​

Keys are globally unique. Use prefixes to namespace and query:

user:preferences           → per-user settings
team:playbook:onboarding → shared team resources
agent:memory:scratchpad → agent working memory

Example: Per-user memory in a Slack bot​

Partition memory by Slack workspace and user so each person's preferences are isolated.

Key format: slack:{team_id}:{user_id}

import httpx

LITELLM_BASE = "http://localhost:4000"
LITELLM_KEY = "sk-1234"

def memory_key(team_id: str, user_id: str) -> str:
return f"slack:{team_id}:{user_id}"

async def get_preferences(team_id: str, user_id: str) -> str:
"""Read saved preferences. Returns "" if none exist."""
key = memory_key(team_id, user_id)
async with httpx.AsyncClient() as client:
r = await client.get(
f"{LITELLM_BASE}/v1/memory/{key}",
headers={"Authorization": f"Bearer {LITELLM_KEY}"},
)
if r.status_code == 404:
return ""
return r.json().get("value", "")

async def save_preference(team_id: str, user_id: str, note: str):
"""Append a preference. PUT upserts — creates or updates."""
key = memory_key(team_id, user_id)
existing = await get_preferences(team_id, user_id)

# Store as bullet list
bullets = [b for b in existing.split("\n") if b.strip()]
bullets.append(f"- {note}")

async with httpx.AsyncClient() as client:
await client.put(
f"{LITELLM_BASE}/v1/memory/{key}",
headers={"Authorization": f"Bearer {LITELLM_KEY}"},
json={"value": "\n".join(bullets)},
)

Inject into your system prompt each turn:

prefs = await get_preferences(team_id, user_id)

messages = [
{"role": "system", "content": f"""You are a helpful assistant.

SAVED USER PREFERENCES:
{prefs}

Follow these unless the current message contradicts them."""},
{"role": "user", "content": user_message},
]

Query all preferences for a workspace:

curl "http://localhost:4000/v1/memory?key_prefix=slack:T024BE7LD:" \
-H "Authorization: Bearer sk-1234"

Metadata​

Attach any JSON to an entry:

{
"key": "agent:findings",
"value": "Q1 API usage up 15%...",
"metadata": {"tags": ["research"], "confidence": 0.92}
}

API Reference​

Full request/response schemas, parameters, and error codes: /memory endpoint reference.

🚅
LiteLLM Enterprise
SSO/SAML, audit logs, spend tracking, multi-team management, and guardrails — built for production.
Learn more →