LiteLLM - Local Caching

Caching `completion()` and `embedding()` calls when switched on

liteLLM implements exact match caching and supports the following Caching:

In-Memory Caching [Default]
Redis Caching Local
Redis Caching Hosted

Quick Start Usage - Completion

Caching - cache Keys in the cache are model, the following example will lead to a cache hit

import litellm
from litellm import completion
from litellm.caching.caching import Cache
litellm.cache = Cache()

# Make completion calls
response1 = completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "Tell me a joke."}]
    caching=True
)
response2 = completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "Tell me a joke."}],
    caching=True
)

# response1 == response2, response 1 is cached

Custom Key-Value Pairs

Add custom key-value pairs to your cache.

from litellm.caching.caching import Cache
cache = Cache()

cache.add_cache(cache_key="test-key", result="1234")

cache.get_cache(cache_key="test-key")

Caching with Streaming

LiteLLM can cache your streamed responses for you

Usage

import litellm
from litellm import completion
from litellm.caching.caching import Cache
litellm.cache = Cache()

# Make completion calls
response1 = completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "Tell me a joke."}], 
    stream=True,
    caching=True)
for chunk in response1:
    print(chunk)
response2 = completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "Tell me a joke."}], 
    stream=True,
    caching=True)
for chunk in response2:
    print(chunk)

Usage - Embedding()

Caching - cache Keys in the cache are model, the following example will lead to a cache hit

import time
import litellm
from litellm import embedding
from litellm.caching.caching import Cache
litellm.cache = Cache()

start_time = time.time()
embedding1 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
end_time = time.time()
print(f"Embedding 1 response time: {end_time - start_time} seconds")

start_time = time.time()
embedding2 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
end_time = time.time()
print(f"Embedding 2 response time: {end_time - start_time} seconds")

Caching completion() and embedding() calls when switched on​

Quick Start Usage - Completion​

Custom Key-Value Pairs​

Caching with Streaming​

Usage​