Memory Management
Store user preferences and feedback so your LLM remembers them across sessions. Scoped per user and team, with built-in access control.
Requires: LiteLLM v1.83.10+ with PostgreSQL connected. No config changes needed.
Create​
- curl
- Python
curl -X POST "http://localhost:4000/v1/memory" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"key": "user:preferences",
"value": "Prefers concise responses. Timezone: PST.",
"metadata": {"version": 1}
}'
import httpx
client = httpx.Client(
base_url="http://localhost:4000",
headers={"Authorization": "Bearer sk-1234"},
)
client.post("/v1/memory", json={
"key": "user:preferences",
"value": "Prefers concise responses. Timezone: PST.",
"metadata": {"version": 1},
})
Read​
curl "http://localhost:4000/v1/memory/user:preferences" \
-H "Authorization: Bearer sk-1234"
Update​
curl -X PUT "http://localhost:4000/v1/memory/user:preferences" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{"value": "Prefers concise responses. Timezone: EST."}'
List​
# All entries
curl "http://localhost:4000/v1/memory" \
-H "Authorization: Bearer sk-1234"
# By prefix
curl "http://localhost:4000/v1/memory?key_prefix=user:" \
-H "Authorization: Bearer sk-1234"
Delete​
curl -X DELETE "http://localhost:4000/v1/memory/user:preferences" \
-H "Authorization: Bearer sk-1234"
Access Control​
Scoping is automatic based on the API key.
| Role | Reads | Writes |
|---|---|---|
| User | Own + team entries | Own entries only |
| Team admin | Own + team entries | Own + team entries |
| Proxy admin | All | All |
Key Naming​
Keys are globally unique. Use prefixes to namespace and query:
user:preferences → per-user settings
team:playbook:onboarding → shared team resources
agent:memory:scratchpad → agent working memory
Example: Per-user memory in a Slack bot​
Partition memory by Slack workspace and user so each person's preferences are isolated.
Key format: slack:{team_id}:{user_id}
import httpx
LITELLM_BASE = "http://localhost:4000"
LITELLM_KEY = "sk-1234"
def memory_key(team_id: str, user_id: str) -> str:
return f"slack:{team_id}:{user_id}"
async def get_preferences(team_id: str, user_id: str) -> str:
"""Read saved preferences. Returns "" if none exist."""
key = memory_key(team_id, user_id)
async with httpx.AsyncClient() as client:
r = await client.get(
f"{LITELLM_BASE}/v1/memory/{key}",
headers={"Authorization": f"Bearer {LITELLM_KEY}"},
)
if r.status_code == 404:
return ""
return r.json().get("value", "")
async def save_preference(team_id: str, user_id: str, note: str):
"""Append a preference. PUT upserts — creates or updates."""
key = memory_key(team_id, user_id)
existing = await get_preferences(team_id, user_id)
# Store as bullet list
bullets = [b for b in existing.split("\n") if b.strip()]
bullets.append(f"- {note}")
async with httpx.AsyncClient() as client:
await client.put(
f"{LITELLM_BASE}/v1/memory/{key}",
headers={"Authorization": f"Bearer {LITELLM_KEY}"},
json={"value": "\n".join(bullets)},
)
Inject into your system prompt each turn:
prefs = await get_preferences(team_id, user_id)
messages = [
{"role": "system", "content": f"""You are a helpful assistant.
SAVED USER PREFERENCES:
{prefs}
Follow these unless the current message contradicts them."""},
{"role": "user", "content": user_message},
]
Query all preferences for a workspace:
curl "http://localhost:4000/v1/memory?key_prefix=slack:T024BE7LD:" \
-H "Authorization: Bearer sk-1234"
Metadata​
Attach any JSON to an entry:
{
"key": "agent:findings",
"value": "Q1 API usage up 15%...",
"metadata": {"tags": ["research"], "confidence": 0.92}
}
API Reference​
Full request/response schemas, parameters, and error codes: /memory endpoint reference.