Snowflake Cortex
LiteLLM supports all models on the Snowflake Cortex REST API, including models from Anthropic (Claude), OpenAI (GPT), Meta (Llama), Mistral, DeepSeek, and Snowflake.
| Description | Snowflake Cortex REST API provides access to leading frontier LLMs through OpenAI-compatible and Anthropic-compatible endpoints. All inference runs within Snowflake's security perimeter. |
| Provider Route on LiteLLM | snowflake/ |
| Provider Docs | Cortex REST API ↗ |
| API Endpoints | Chat Completions: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/chat/completions Messages: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/messages Legacy: https://{account}.snowflakecomputing.com/api/v2/cortex/inference:complete |
| Supported OpenAI Endpoints | /chat/completions, /completions, /embeddings |
Tip : We support ALL Snowflake Cortex models. Use model=snowflake/<model-name> as a prefix when sending LiteLLM requests.
Authentication​
Snowflake Cortex REST API supports three authentication methods.
Programmatic Access Token (PAT) — Recommended​
The simplest approach. Generate a PAT in Snowsight under User Menu → My Profile → Programmatic Access Tokens.
import os
from litellm import completion
os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-programmatic-access-token>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
)
JWT (Key-Pair Authentication)​
Generate a JWT from a Snowflake key pair. See Key-pair authentication.
import os
from litellm import completion
os.environ["SNOWFLAKE_JWT"] = "<your-jwt-token>"
os.environ["SNOWFLAKE_ACCOUNT_ID"] = "<orgname>-<account_name>"
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
)
Pass credentials as parameters​
from litellm import completion
# Using PAT
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
api_key="pat/<your-pat-token>",
api_base="https://<account>.snowflakecomputing.com/api/v2/cortex/v1",
)
# Using JWT
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
api_key="<your-jwt-token>",
account_id="<orgname>-<account_name>",
)
For all authentication options, see Authenticating to Cortex REST API.
Usage​
- SDK
- PROXY
from litellm import completion
import os
os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "What is Snowflake Cortex?"}],
)
print(response.choices[0].message.content)
1. Config
model_list:
- model_name: claude-sonnet
litellm_params:
model: snowflake/claude-sonnet-4-5
api_key: pat/<your-pat>
api_base: https://<account>.snowflakecomputing.com/api/v2/cortex/v1
- model_name: llama4-maverick
litellm_params:
model: snowflake/llama4-maverick
api_key: pat/<your-pat>
api_base: https://<account>.snowflakecomputing.com/api/v2/cortex/v1
2. Start proxy
litellm --config /path/to/config.yaml
3. Test
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "claude-sonnet",
"messages": [
{"role": "user", "content": "What is Snowflake Cortex?"}
]
}'
Supported OpenAI Parameters​
temperature, max_tokens, top_p, stream, response_format,
tools, tool_choice
Streaming​
- SDK
- PROXY
from litellm import completion
import os
os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a haiku about data."}],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "claude-sonnet",
"messages": [{"role": "user", "content": "Write a haiku about data."}],
"stream": true
}'
Tool / Function Calling​
Supported on Claude and select models. LiteLLM automatically transforms OpenAI tool format to Snowflake's tool_spec format.
- SDK
- PROXY
from litellm import completion
import os, json
os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"],
},
},
}
]
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
tools=tools,
tool_choice="auto",
)
print(response.choices[0].message.tool_calls)
model_list:
- model_name: claude-sonnet
litellm_params:
model: snowflake/claude-sonnet-4-5
api_key: pat/<your-pat>
api_base: https://<account>.snowflakecomputing.com/api/v2/cortex/v1
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "claude-sonnet",
"messages": [{"role": "user", "content": "What is the weather in SF?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'
Thinking / Reasoning​
Claude 3.7 Sonnet, Claude 4 Opus, and DeepSeek R1 on Cortex support extended thinking. LiteLLM translates reasoning_effort to the provider's thinking parameter.
reasoning_effort | budget_tokens |
|---|---|
"low" | 1024 |
"medium" | 2048 |
"high" | 4096 |
from litellm import completion
response = completion(
model="snowflake/claude-3-7-sonnet",
messages=[{"role": "user", "content": "Solve: what is 127 * 389?"}],
reasoning_effort="low",
)
print(response.choices[0].message.content)
Prompt Caching​
Snowflake Cortex supports prompt caching to reduce costs:
- OpenAI models: Implicit caching for prompts ≥ 1,024 tokens (no code changes needed)
- Claude models: Explicit caching via
cache_controlbreakpoints
Cached input tokens are billed at 10% of the regular input rate (90% discount) when ≥ 1,024 tokens are cached.
See Cortex REST API Billing & Cost Analysis for details.
Embeddings​
from litellm import embedding
import os
os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"
response = embedding(
model="snowflake/snowflake-arctic-embed-l-v2.0",
input=["Snowflake Cortex provides LLM inference"],
)
print(response.data[0]["embedding"][:5])
Supported Models​
All models are available through the snowflake/ prefix.
For current model availability, rate limits, and pricing, see the official Cortex REST API docs and Service Consumption Table.
Chat Completion Models​
| Model | litellm model name | Function Calling | Vision | Prompt Caching |
|---|---|---|---|---|
| Claude Sonnet 4.5 | snowflake/claude-sonnet-4-5 | ✅ | ✅ | ✅ |
| Claude Sonnet 4.6 | snowflake/claude-sonnet-4-6 | ✅ | ✅ | ✅ |
| Claude 4 Sonnet | snowflake/claude-4-sonnet | ✅ | ✅ | ✅ |
| Claude 4 Opus | snowflake/claude-4-opus | ✅ | ✅ | ✅ |
| Claude Haiku 4.5 | snowflake/claude-haiku-4-5 | ✅ | ✅ | ✅ |
| Claude 3.7 Sonnet | snowflake/claude-3-7-sonnet | ✅ | ✅ | ✅ |
| Claude 3.5 Sonnet | snowflake/claude-3-5-sonnet | ✅ | ✅ | ✅ |
| OpenAI GPT-4.1 | snowflake/openai-gpt-4.1 | ✅ | ✅ | ✅ |
| OpenAI GPT-5 | snowflake/openai-gpt-5 | ✅ | ✅ | ✅ |
| OpenAI GPT-5 Mini | snowflake/openai-gpt-5-mini | ✅ | ||
| OpenAI GPT-5 Nano | snowflake/openai-gpt-5-nano | ✅ | ||
| DeepSeek R1 | snowflake/deepseek-r1 | |||
| Mistral Large 2 | snowflake/mistral-large2 | ✅ | ||
| Llama 3.1 8B | snowflake/llama3.1-8b | |||
| Llama 3.1 70B | snowflake/llama3.1-70b | ✅ | ||
| Llama 3.1 405B | snowflake/llama3.1-405b | ✅ | ||
| Llama 3.3 70B | snowflake/llama3.3-70b | ✅ | ||
| Llama 4 Maverick | snowflake/llama4-maverick | ✅ | ||
| Snowflake Llama 3.3 70B | snowflake/snowflake-llama-3.3-70b | ✅ |
Embedding Models​
| Model | litellm model name |
|---|---|
| Snowflake Arctic Embed L v2.0 | snowflake/snowflake-arctic-embed-l-v2.0 |
| Snowflake Arctic Embed M v2.0 | snowflake/snowflake-arctic-embed-m-v2.0 |