Bedrock Embedding
Supported Embedding Modelsโ
Provider | LiteLLM Route | AWS Documentation |
---|---|---|
Amazon Titan | bedrock/amazon.* | Amazon Titan Embeddings |
Cohere | bedrock/cohere.* | Cohere Embeddings |
TwelveLabs | bedrock/us.twelvelabs.* | TwelveLabs |
Async Invoke Supportโ
LiteLLM supports AWS Bedrock's async-invoke feature for embedding models that require asynchronous processing, particularly useful for large media files (video, audio) or when you need to process embeddings in the background.
Supported Modelsโ
Provider | Async Invoke Route | Use Case |
---|---|---|
TwelveLabs Marengo | bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0 | Video, audio, image, and text embeddings |
Required Parametersโ
When using async-invoke, you must provide:
Parameter | Description | Required |
---|---|---|
output_s3_uri | S3 URI where the embedding results will be stored | โ Yes |
input_type | Type of input: "text" , "image" , "video" , or "audio" | โ Yes |
aws_region_name | AWS region for the request | โ Yes |
Usageโ
Basic Async Invokeโ
from litellm import embedding
# Text embedding with async-invoke
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world from LiteLLM async invoke!"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)
print(f"Job submitted! Invocation ARN: {response._hidden_params._invocation_arn}")
Video/Audio Embeddingโ
# Video embedding (requires async-invoke)
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["s3://your-bucket/video.mp4"], # S3 URL for video
aws_region_name="us-east-1",
input_type="video",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)
print(f"Video embedding job submitted! ARN: {response._hidden_params._invocation_arn}")
Image Embedding with Base64โ
import base64
# Load and encode image
with open("image.jpg", "rb") as img_file:
img_data = base64.b64encode(img_file.read()).decode('utf-8')
img_base64 = f"data:image/jpeg;base64,{img_data}"
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=[img_base64],
aws_region_name="us-east-1",
input_type="image",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)
Retrieving Job Informationโ
Getting Job ID and Invocation ARNโ
The async-invoke response includes the invocation ARN in the hidden parameters:
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)
# Access invocation ARN
invocation_arn = response._hidden_params._invocation_arn
print(f"Invocation ARN: {invocation_arn}")
# Extract job ID from ARN (last part after the last slash)
job_id = invocation_arn.split("/")[-1]
print(f"Job ID: {job_id}")
Checking Job Statusโ
Use LiteLLM's retrieve_batch
function to check if your job is still processing:
from litellm import retrieve_batch
def check_async_job_status(invocation_arn, aws_region_name="us-east-1"):
"""Check the status of an async invoke job using LiteLLM batch API"""
try:
response = retrieve_batch(
batch_id=invocation_arn,
custom_llm_provider="bedrock",
aws_region_name=aws_region_name
)
return response
except Exception as e:
print(f"Error checking job status: {e}")
return None
# Check status
status = check_async_job_status(invocation_arn, "us-east-1")
if status:
print(f"Job Status: {status.status}")
print(f"Output Location: {status.output_file_id}")
Note: The actual embedding results are stored in S3. The output_file_id
from the batch status can be used to locate the results file in your S3 bucket.
Error Handlingโ
Common Errorsโ
Error | Cause | Solution |
---|---|---|
ValueError: output_s3_uri cannot be empty | Missing S3 output URI | Provide a valid S3 URI |
ValueError: Input type 'video' requires async_invoke route | Using video/audio without async-invoke | Use bedrock/async_invoke/ model prefix |
ValueError: input_type is required | Missing input type parameter | Specify input_type parameter |
Example Error Handlingโ
try:
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/output/" # Required for async-invoke
)
print("Job submitted successfully!")
except ValueError as e:
if "output_s3_uri cannot be empty" in str(e):
print("Error: Please provide a valid S3 output URI")
elif "requires async_invoke route" in str(e):
print("Error: Use async_invoke model for video/audio inputs")
else:
print(f"Error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Best Practicesโ
- Use async-invoke for large files: Video and audio files are better processed asynchronously
- Use LiteLLM batch API: Use
retrieve_batch()
instead of direct Bedrock API calls for status checking - Monitor job status: Check job status periodically using the batch API to know when results are ready
- Handle errors gracefully: Implement proper error handling for network issues and job failures
- Set appropriate timeouts: Consider the processing time for large files
- Use S3 for large inputs: For video/audio, use S3 URLs instead of base64 encoding
Limitationsโ
- Async-invoke is currently only supported for TwelveLabs Marengo models
- Results are stored in S3 and must be retrieved separately using the output file ID
- Job status checking requires using LiteLLM's
retrieve_batch()
function - No built-in polling mechanism in LiteLLM (must implement your own status checking loop)
API keysโ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
Usageโ
LiteLLM Python SDKโ
from litellm import embedding
response = embedding(
model="bedrock/amazon.titan-embed-text-v1",
input=["good morning from litellm"],
)
print(response)
LiteLLM Proxy Serverโ
1. Setup config.yamlโ
model_list:
- model_name: titan-embed-v1
litellm_params:
model: bedrock/amazon.titan-embed-text-v1
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
- model_name: titan-embed-v2
litellm_params:
model: bedrock/amazon.titan-embed-text-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
2. Start Proxyโ
litellm --config /path/to/config.yaml
3. Use with OpenAI Python SDKโ
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.embeddings.create(
input=["good morning from litellm"],
model="titan-embed-v1"
)
print(response)
4. Use with LiteLLM Python SDKโ
import litellm
response = litellm.embedding(
model="titan-embed-v1", # model alias from config.yaml
input=["good morning from litellm"],
api_base="http://0.0.0.0:4000",
api_key="anything"
)
print(response)
Supported AWS Bedrock Embedding Modelsโ
Model Name | Usage | Supported Additional OpenAI params |
---|---|---|
Titan Embeddings V2 | embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input) | here |
Titan Embeddings - V1 | embedding(model="bedrock/amazon.titan-embed-text-v1", input=input) | here |
Titan Multimodal Embeddings | embedding(model="bedrock/amazon.titan-embed-image-v1", input=input) | here |
TwelveLabs Marengo Embed 2.7 | embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input) | Supports multimodal input (text, video, audio, image) |
Cohere Embeddings - English | embedding(model="bedrock/cohere.embed-english-v3", input=input) | here |
Cohere Embeddings - Multilingual | embedding(model="bedrock/cohere.embed-multilingual-v3", input=input) | here |