Skip to main content

Bedrock Embedding

Supported Embedding Modelsโ€‹

ProviderLiteLLM RouteAWS Documentation
Amazon Titanbedrock/amazon.*Amazon Titan Embeddings
Coherebedrock/cohere.*Cohere Embeddings
TwelveLabsbedrock/us.twelvelabs.*TwelveLabs

Async Invoke Supportโ€‹

LiteLLM supports AWS Bedrock's async-invoke feature for embedding models that require asynchronous processing, particularly useful for large media files (video, audio) or when you need to process embeddings in the background.

Supported Modelsโ€‹

ProviderAsync Invoke RouteUse Case
TwelveLabs Marengobedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0Video, audio, image, and text embeddings

Required Parametersโ€‹

When using async-invoke, you must provide:

ParameterDescriptionRequired
output_s3_uriS3 URI where the embedding results will be storedโœ… Yes
input_typeType of input: "text", "image", "video", or "audio"โœ… Yes
aws_region_nameAWS region for the requestโœ… Yes

Usageโ€‹

Basic Async Invokeโ€‹

from litellm import embedding

# Text embedding with async-invoke
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world from LiteLLM async invoke!"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

print(f"Job submitted! Invocation ARN: {response._hidden_params._invocation_arn}")

Video/Audio Embeddingโ€‹

# Video embedding (requires async-invoke)
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["s3://your-bucket/video.mp4"], # S3 URL for video
aws_region_name="us-east-1",
input_type="video",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

print(f"Video embedding job submitted! ARN: {response._hidden_params._invocation_arn}")

Image Embedding with Base64โ€‹

import base64

# Load and encode image
with open("image.jpg", "rb") as img_file:
img_data = base64.b64encode(img_file.read()).decode('utf-8')
img_base64 = f"data:image/jpeg;base64,{img_data}"

response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=[img_base64],
aws_region_name="us-east-1",
input_type="image",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

Retrieving Job Informationโ€‹

Getting Job ID and Invocation ARNโ€‹

The async-invoke response includes the invocation ARN in the hidden parameters:

response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/async-invoke-output/"
)

# Access invocation ARN
invocation_arn = response._hidden_params._invocation_arn
print(f"Invocation ARN: {invocation_arn}")

# Extract job ID from ARN (last part after the last slash)
job_id = invocation_arn.split("/")[-1]
print(f"Job ID: {job_id}")

Checking Job Statusโ€‹

Use LiteLLM's retrieve_batch function to check if your job is still processing:

from litellm import retrieve_batch

def check_async_job_status(invocation_arn, aws_region_name="us-east-1"):
"""Check the status of an async invoke job using LiteLLM batch API"""
try:
response = retrieve_batch(
batch_id=invocation_arn,
custom_llm_provider="bedrock",
aws_region_name=aws_region_name
)
return response
except Exception as e:
print(f"Error checking job status: {e}")
return None

# Check status
status = check_async_job_status(invocation_arn, "us-east-1")
if status:
print(f"Job Status: {status.status}")
print(f"Output Location: {status.output_file_id}")

Note: The actual embedding results are stored in S3. The output_file_id from the batch status can be used to locate the results file in your S3 bucket.

Error Handlingโ€‹

Common Errorsโ€‹

ErrorCauseSolution
ValueError: output_s3_uri cannot be emptyMissing S3 output URIProvide a valid S3 URI
ValueError: Input type 'video' requires async_invoke routeUsing video/audio without async-invokeUse bedrock/async_invoke/ model prefix
ValueError: input_type is requiredMissing input type parameterSpecify input_type parameter

Example Error Handlingโ€‹

try:
response = embedding(
model="bedrock/async_invoke/us.twelvelabs.marengo-embed-2-7-v1:0",
input=["Hello world"],
aws_region_name="us-east-1",
input_type="text",
output_s3_uri="s3://your-bucket/output/" # Required for async-invoke
)
print("Job submitted successfully!")

except ValueError as e:
if "output_s3_uri cannot be empty" in str(e):
print("Error: Please provide a valid S3 output URI")
elif "requires async_invoke route" in str(e):
print("Error: Use async_invoke model for video/audio inputs")
else:
print(f"Error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")

Best Practicesโ€‹

  1. Use async-invoke for large files: Video and audio files are better processed asynchronously
  2. Use LiteLLM batch API: Use retrieve_batch() instead of direct Bedrock API calls for status checking
  3. Monitor job status: Check job status periodically using the batch API to know when results are ready
  4. Handle errors gracefully: Implement proper error handling for network issues and job failures
  5. Set appropriate timeouts: Consider the processing time for large files
  6. Use S3 for large inputs: For video/audio, use S3 URLs instead of base64 encoding

Limitationsโ€‹

  • Async-invoke is currently only supported for TwelveLabs Marengo models
  • Results are stored in S3 and must be retrieved separately using the output file ID
  • Job status checking requires using LiteLLM's retrieve_batch() function
  • No built-in polling mechanism in LiteLLM (must implement your own status checking loop)

API keysโ€‹

This can be set as env variables or passed as params to litellm.embedding()

import os
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2

Usageโ€‹

LiteLLM Python SDKโ€‹

from litellm import embedding
response = embedding(
model="bedrock/amazon.titan-embed-text-v1",
input=["good morning from litellm"],
)
print(response)

LiteLLM Proxy Serverโ€‹

1. Setup config.yamlโ€‹

model_list:
- model_name: titan-embed-v1
litellm_params:
model: bedrock/amazon.titan-embed-text-v1
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
- model_name: titan-embed-v2
litellm_params:
model: bedrock/amazon.titan-embed-text-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1

2. Start Proxyโ€‹

litellm --config /path/to/config.yaml

3. Use with OpenAI Python SDKโ€‹

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

response = client.embeddings.create(
input=["good morning from litellm"],
model="titan-embed-v1"
)
print(response)

4. Use with LiteLLM Python SDKโ€‹

import litellm
response = litellm.embedding(
model="titan-embed-v1", # model alias from config.yaml
input=["good morning from litellm"],
api_base="http://0.0.0.0:4000",
api_key="anything"
)
print(response)

Supported AWS Bedrock Embedding Modelsโ€‹

Model NameUsageSupported Additional OpenAI params
Titan Embeddings V2embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)here
Titan Embeddings - V1embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)here
Titan Multimodal Embeddingsembedding(model="bedrock/amazon.titan-embed-image-v1", input=input)here
TwelveLabs Marengo Embed 2.7embedding(model="bedrock/us.twelvelabs.marengo-embed-2-7-v1:0", input=input)Supports multimodal input (text, video, audio, image)
Cohere Embeddings - Englishembedding(model="bedrock/cohere.embed-english-v3", input=input)here
Cohere Embeddings - Multilingualembedding(model="bedrock/cohere.embed-multilingual-v3", input=input)here

Advanced - Drop Unsupported Paramsโ€‹

Advanced - Pass model/provider-specific Paramsโ€‹