Skip to main content

Azure Responses API

PropertyDetails
DescriptionAzure OpenAI Responses API
custom_llm_provider on LiteLLMazure/
Supported Operations/v1/responses
Azure OpenAI Responses APIAzure OpenAI Responses API โ†—
Cost Tracking, Logging Supportโœ… LiteLLM will log, track cost for Responses API Requests
Supported OpenAI Paramsโœ… All OpenAI params are supported, See here

Usageโ€‹

Create a model responseโ€‹

Non-streamingโ€‹

Azure Responses API
import litellm

# Non-streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)

print(response)

Streamingโ€‹

Azure Responses API
import litellm

# Streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)

for event in response:
print(event)

Azure Codex Modelsโ€‹

Codex models use Azure's new /v1/preview API which provides ongoing access to the latest features with no need to update api-version each month.

LiteLLM will send your requests to the /v1/preview endpoint when you set api_version="preview".

Non-streamingโ€‹

Azure Codex Models
import litellm

# Non-streaming response with Codex models
response = litellm.responses(
model="azure/codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com",
api_version="preview", # ๐Ÿ‘ˆ key difference
)

print(response)

Streamingโ€‹

Azure Codex Models
import litellm

# Streaming response with Codex models
response = litellm.responses(
model="azure/codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com",
api_version="preview", # ๐Ÿ‘ˆ key difference
)

for event in response:
print(event)

Calling via /chat/completionsโ€‹

You can also call the Azure Responses API via the /chat/completions endpoint.

from litellm import completion
import os

os.environ["AZURE_API_BASE"] = "https://my-endpoint-sweden-berri992.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2023-03-15-preview"
os.environ["AZURE_API_KEY"] = "my-api-key"

response = completion(
model="azure/responses/my-custom-o1-pro",
messages=[{"role": "user", "content": "Hello world"}],
)

print(response)