Triton Inference Server

LiteLLM supports Embedding Models on Triton Inference Servers


Example Call

Use the triton/ prefix to route to triton server

from litellm import embedding
import os

response = await litellm.aembedding(
api_base="https://your-triton-api-base/triton/embeddings", # /embeddings endpoint you want litellm to call on your server
input=["good morning from litellm"],