Skip to main content

HuggingFace Rerank

HuggingFace Rerank allows you to use reranking models hosted on Hugging Face infrastructure or your custom endpoints to reorder documents based on their relevance to a query.

PropertyDetails
DescriptionHuggingFace Rerank enables semantic reranking of documents using models hosted on Hugging Face infrastructure or custom endpoints.
Provider Route on LiteLLMhuggingface/ in model name
Provider DocHugging Face Hub ↗

Quick Start​

LiteLLM Python SDK​

Example using LiteLLM Python SDK
import litellm
import os

# Set your HuggingFace token
os.environ["HF_TOKEN"] = "hf_xxxxxx"

# Basic rerank usage
response = litellm.rerank(
model="huggingface/BAAI/bge-reranker-base",
query="What is the capital of the United States?",
documents=[
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
],
top_n=3,
)

print(response)

Custom Endpoint Usage​

Using custom HuggingFace endpoint
import litellm

response = litellm.rerank(
model="huggingface/BAAI/bge-reranker-base",
query="hello",
documents=["hello", "world"],
top_n=2,
api_base="https://my-custom-hf-endpoint.com",
api_key="test_api_key",
)

print(response)

Async Usage​

Async rerank example
import litellm
import asyncio
import os

os.environ["HF_TOKEN"] = "hf_xxxxxx"

async def async_rerank_example():
response = await litellm.arerank(
model="huggingface/BAAI/bge-reranker-base",
query="What is the capital of the United States?",
documents=[
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
],
top_n=3,
)
print(response)

asyncio.run(async_rerank_example())

LiteLLM Proxy​

1. Configure your model in config.yaml​

model_list:
- model_name: bge-reranker-base
litellm_params:
model: huggingface/BAAI/bge-reranker-base
api_key: os.environ/HF_TOKEN
- model_name: bge-reranker-large
litellm_params:
model: huggingface/BAAI/bge-reranker-large
api_key: os.environ/HF_TOKEN
- model_name: custom-reranker
litellm_params:
model: huggingface/BAAI/bge-reranker-base
api_base: https://my-custom-hf-endpoint.com
api_key: your-custom-api-key

2. Start the proxy​

export HF_TOKEN="hf_xxxxxx"
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Make rerank requests​

curl http://localhost:4000/rerank \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-d '{
"model": "bge-reranker-base",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 3
}'

Configuration Options​

Authentication​

Using HuggingFace Token (Serverless)​

import os
os.environ["HF_TOKEN"] = "hf_xxxxxx"

# Or pass directly
litellm.rerank(
model="huggingface/BAAI/bge-reranker-base",
api_key="hf_xxxxxx",
# ... other params
)

Using Custom Endpoint​

litellm.rerank(
model="huggingface/BAAI/bge-reranker-base",
api_base="https://your-custom-endpoint.com",
api_key="your-custom-key",
# ... other params
)

Response Format​

The response follows the standard rerank API format:

{
"results": [
{
"index": 3,
"relevance_score": 0.999071
},
{
"index": 4,
"relevance_score": 0.7867867
},
{
"index": 0,
"relevance_score": 0.32713068
}
],
"id": "07734bd2-2473-4f07-94e1-0d9f0e6843cf",
"meta": {
"api_version": {
"version": "2",
"is_experimental": false
},
"billed_units": {
"search_units": 1
}
}
}