Skip to main content

/invoke

Call Bedrock's /invoke endpoint through LiteLLM Proxy.

FeatureSupported
Cost Trackingโœ…
Loggingโœ…
Streamingโœ… via /invoke-with-response-stream
Load Balancingโœ…

Quick Startโ€‹

1. Setup config.yamlโ€‹

model_list:
- model_name: my-bedrock-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-west-2
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID # reads from environment
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
custom_llm_provider: bedrock

Set AWS credentials in your environment:

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

2. Start Proxyโ€‹

litellm --config config.yaml

# RUNNING on http://0.0.0.0:4000

3. Call /invoke endpointโ€‹

curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-bedrock-model/invoke' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"max_tokens": 100,
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"anthropic_version": "bedrock-2023-05-31"
}'

Streamingโ€‹

For streaming responses, use /invoke-with-response-stream:

curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-bedrock-model/invoke-with-response-stream' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"max_tokens": 100,
"messages": [
{
"role": "user",
"content": "Tell me a short story"
}
],
"anthropic_version": "bedrock-2023-05-31"
}'

Load Balancingโ€‹

Define multiple deployments with the same model_name for automatic load balancing:

model_list:
# Deployment 1 - us-west-2
- model_name: my-bedrock-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-west-2
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
custom_llm_provider: bedrock

# Deployment 2 - us-east-1
- model_name: my-bedrock-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-east-1
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
custom_llm_provider: bedrock

The proxy automatically distributes requests across both regions.

Using boto3 SDKโ€‹

import boto3
import json
import os

# Set dummy AWS credentials (required by boto3, but not used by LiteLLM proxy)
os.environ['AWS_ACCESS_KEY_ID'] = 'dummy'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'dummy'
os.environ['AWS_BEARER_TOKEN_BEDROCK'] = "sk-1234" # your litellm proxy api key

# Point boto3 to the LiteLLM proxy
bedrock_runtime = boto3.client(
service_name='bedrock-runtime',
region_name='us-west-2',
endpoint_url='http://0.0.0.0:4000/bedrock'
)

response = bedrock_runtime.invoke_model(
modelId='my-bedrock-model', # Your model_name from config.yaml
contentType='application/json',
accept='application/json',
body=json.dumps({
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hello"}],
"anthropic_version": "bedrock-2023-05-31"
})
)

response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])

More Infoโ€‹

For complete documentation including Guardrails, Knowledge Bases, and Agents, see: