Skip to main content

Snowflake Cortex

LiteLLM supports all models on the Snowflake Cortex REST API, including models from Anthropic (Claude), OpenAI (GPT), Meta (Llama), Mistral, DeepSeek, and Snowflake.

DescriptionSnowflake Cortex REST API provides access to leading frontier LLMs through OpenAI-compatible and Anthropic-compatible endpoints. All inference runs within Snowflake's security perimeter.
Provider Route on LiteLLMsnowflake/
Provider DocsCortex REST API ↗
API EndpointsChat Completions: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/chat/completions
Messages: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/messages
Legacy: https://{account}.snowflakecomputing.com/api/v2/cortex/inference:complete
Supported OpenAI Endpoints/chat/completions, /completions, /embeddings

Tip : We support ALL Snowflake Cortex models. Use model=snowflake/<model-name> as a prefix when sending LiteLLM requests.

Authentication​

Snowflake Cortex REST API supports three authentication methods.

The simplest approach. Generate a PAT in Snowsight under User Menu → My Profile → Programmatic Access Tokens.

import os
from litellm import completion

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-programmatic-access-token>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
)

JWT (Key-Pair Authentication)​

Generate a JWT from a Snowflake key pair. See Key-pair authentication.

import os
from litellm import completion

os.environ["SNOWFLAKE_JWT"] = "<your-jwt-token>"
os.environ["SNOWFLAKE_ACCOUNT_ID"] = "<orgname>-<account_name>"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
)

Pass credentials as parameters​

from litellm import completion

# Using PAT
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
api_key="pat/<your-pat-token>",
api_base="https://<account>.snowflakecomputing.com/api/v2/cortex/v1",
)

# Using JWT
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
api_key="<your-jwt-token>",
account_id="<orgname>-<account_name>",
)

For all authentication options, see Authenticating to Cortex REST API.

Usage​

from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "What is Snowflake Cortex?"}],
)
print(response.choices[0].message.content)

Supported OpenAI Parameters​

temperature, max_tokens, top_p, stream, response_format,
tools, tool_choice

Streaming​

from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a haiku about data."}],
stream=True,
)

for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

Tool / Function Calling​

Supported on Claude and select models. LiteLLM automatically transforms OpenAI tool format to Snowflake's tool_spec format.

from litellm import completion
import os, json

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"],
},
},
}
]

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
tools=tools,
tool_choice="auto",
)

print(response.choices[0].message.tool_calls)

Thinking / Reasoning​

Claude 3.7 Sonnet, Claude 4 Opus, and DeepSeek R1 on Cortex support extended thinking. LiteLLM translates reasoning_effort to the provider's thinking parameter.

reasoning_effortbudget_tokens
"low"1024
"medium"2048
"high"4096
from litellm import completion

response = completion(
model="snowflake/claude-3-7-sonnet",
messages=[{"role": "user", "content": "Solve: what is 127 * 389?"}],
reasoning_effort="low",
)
print(response.choices[0].message.content)

Prompt Caching​

Snowflake Cortex supports prompt caching to reduce costs:

  • OpenAI models: Implicit caching for prompts ≥ 1,024 tokens (no code changes needed)
  • Claude models: Explicit caching via cache_control breakpoints

Cached input tokens are billed at 10% of the regular input rate (90% discount) when ≥ 1,024 tokens are cached.

See Cortex REST API Billing & Cost Analysis for details.

Embeddings​

from litellm import embedding
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = embedding(
model="snowflake/snowflake-arctic-embed-l-v2.0",
input=["Snowflake Cortex provides LLM inference"],
)
print(response.data[0]["embedding"][:5])

Supported Models​

All models are available through the snowflake/ prefix.

tip

For current model availability, rate limits, and pricing, see the official Cortex REST API docs and Service Consumption Table.

Chat Completion Models​

Modellitellm model nameFunction CallingVisionPrompt Caching
Claude Sonnet 4.5snowflake/claude-sonnet-4-5✅✅✅
Claude Sonnet 4.6snowflake/claude-sonnet-4-6✅✅✅
Claude 4 Sonnetsnowflake/claude-4-sonnet✅✅✅
Claude 4 Opussnowflake/claude-4-opus✅✅✅
Claude Haiku 4.5snowflake/claude-haiku-4-5✅✅✅
Claude 3.7 Sonnetsnowflake/claude-3-7-sonnet✅✅✅
Claude 3.5 Sonnetsnowflake/claude-3-5-sonnet✅✅✅
OpenAI GPT-4.1snowflake/openai-gpt-4.1✅✅✅
OpenAI GPT-5snowflake/openai-gpt-5✅✅✅
OpenAI GPT-5 Minisnowflake/openai-gpt-5-mini✅
OpenAI GPT-5 Nanosnowflake/openai-gpt-5-nano✅
DeepSeek R1snowflake/deepseek-r1
Mistral Large 2snowflake/mistral-large2✅
Llama 3.1 8Bsnowflake/llama3.1-8b
Llama 3.1 70Bsnowflake/llama3.1-70b✅
Llama 3.1 405Bsnowflake/llama3.1-405b✅
Llama 3.3 70Bsnowflake/llama3.3-70b✅
Llama 4 Mavericksnowflake/llama4-maverick✅
Snowflake Llama 3.3 70Bsnowflake/snowflake-llama-3.3-70b✅

Embedding Models​

Modellitellm model name
Snowflake Arctic Embed L v2.0snowflake/snowflake-arctic-embed-l-v2.0
Snowflake Arctic Embed M v2.0snowflake/snowflake-arctic-embed-m-v2.0