MCP Semantic Tool Filter
Automatically filter MCP tools by semantic relevance. When you have many MCP tools registered, LiteLLM semantically matches the user's query against tool descriptions and sends only the most relevant tools to the LLM.
How It Worksβ
Tool search shifts tool selection from a prompt-engineering problem to a retrieval problem. Instead of injecting a large static list of tools into every prompt, the semantic filter:
- Builds a semantic index of all available MCP tools on startup
- On each request, semantically matches the user's query against tool descriptions
- Returns only the top-K most relevant tools to the LLM
This approach improves context efficiency, increases reliability by reducing tool confusion, and enables scalability to ecosystems with hundreds or thousands of MCP tools.
Configurationβ
Enable semantic filtering in your LiteLLM config:
config.yaml
litellm_settings:
mcp_semantic_tool_filter:
enabled: true
embedding_model: "text-embedding-3-small" # Model for semantic matching
top_k: 5 # Max tools to return
similarity_threshold: 0.3 # Min similarity score
Configuration Options:
enabled- Enable/disable semantic filtering (default:false)embedding_model- Model for generating embeddings (default:"text-embedding-3-small")top_k- Maximum number of tools to return (default:10)similarity_threshold- Minimum similarity score for matches (default:0.3)
Usageβ
Use MCP tools normally with the Responses API or Chat Completions. The semantic filter runs automatically:
- Responses API
- Chat Completions
Responses API with Semantic Filtering
curl --location 'http://localhost:4000/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer sk-1234" \
--data '{
"model": "gpt-4o",
"input": [
{
"role": "user",
"content": "give me TLDR of what BerriAI/litellm repo is about",
"type": "message"
}
],
"tools": [
{
"type": "mcp",
"server_url": "litellm_proxy",
"require_approval": "never"
}
],
"tool_choice": "required"
}'
Chat Completions with Semantic Filtering
curl --location 'http://localhost:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer sk-1234" \
--data '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Search Wikipedia for LiteLLM"}
],
"tools": [
{
"type": "mcp",
"server_url": "litellm_proxy"
}
]
}'