Skip to main content

OpenTelemetry - Tracing LLMs with any observability tool

OpenTelemetry is a CNCF standard for observability. It connects to any observability tool, such as Jaeger, Zipkin, Datadog, New Relic, Traceloop, Levo AI and others.

Change in v1.81.0

From v1.81.0, the request/response is set as attributes on the parent Received Proxy Server Request span by default โ€” there is no separate litellm_request span unless you opt in. To restore nested litellm_request spans, set USE_OTEL_LITELLM_REQUEST_SPAN=true. See Span Hierarchy for the full picture and Why don't I see a litellm_request span? for when to flip the flag.

Getting Startedโ€‹

Install the OpenTelemetry SDK:

uv add opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

Set the environment variables (different providers may require different variables):

OTEL_EXPORTER="otlp_http"
OTEL_ENDPOINT="https://api.traceloop.com"
OTEL_HEADERS="Authorization=Bearer%20<your-api-key>"

Use just 1 line of code, to instantly log your LLM responses across all providers with OpenTelemetry:

litellm.callbacks = ["otel"]

Span Hierarchyโ€‹

Every LLM request handled by the LiteLLM Proxy produces a tree of spans rooted at Received Proxy Server Request. Conditional spans below are only emitted when their controlling flag is set or their feature is in use.

Received Proxy Server Request                      (SpanKind.SERVER, root)
โ”‚
โ”œโ”€โ”€ litellm_request (INTERNAL, only when USE_OTEL_LITELLM_REQUEST_SPAN=true)
โ”‚ โ”œโ”€โ”€ raw_gen_ai_request (INTERNAL โ€” provider request/response, content-capture-gated)
โ”‚ โ””โ”€โ”€ guardrail (INTERNAL โ€” one per executed guardrail)
โ”‚
โ”œโ”€โ”€ raw_gen_ai_request (INTERNAL โ€” when litellm_request is collapsed into the root)
โ”œโ”€โ”€ guardrail (INTERNAL โ€” when litellm_request is collapsed into the root)
โ”‚
โ”œโ”€โ”€ auth, router, self, proxy_pre_call, (INTERNAL โ€” service-hook spans, see below)
โ”‚ redis, postgres, batch_write_to_db
โ”‚
โ””โ”€โ”€ Failed Proxy Server Request (INTERNAL โ€” only on exception)

In semconv mode (OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental), when an LLM-call span is created its name becomes {operation} {model} (e.g. chat gpt-4) with SpanKind.CLIENT, and raw_gen_ai_request is suppressed. The same USE_OTEL_LITELLM_REQUEST_SPAN gating decides whether the span is emitted at all. See Opt-In to Latest GenAI Semantic Conventions.

The SDK (no proxy) emits litellm_request as the root if no parent context exists โ€” there is no Received Proxy Server Request span in pure-SDK use.

Span name referenceโ€‹

Span nameSpan kindParentWhen emitted
Received Proxy Server RequestSERVERroot (or external traceparent if present)Once per HTTP request to the LiteLLM Proxy
litellm_requestINTERNALproxy root (proxy) or root (SDK)When USE_OTEL_LITELLM_REQUEST_SPAN=true (proxy) or no parent context exists (SDK). In semconv mode replaced by {operation} {model}
raw_gen_ai_requestINTERNALlitellm_request if present, else proxy rootOne per upstream provider call. Carries provider-native request/response under llm.{provider}.*. Suppressed in semconv mode and when message content capture is disabled
guardrailINTERNAL (OpenInference kind = guardrail)litellm_request if present, else proxy rootOne span per guardrail execution (pre-call, during-call, or post-call)
Failed Proxy Server RequestINTERNALproxy rootWhen the proxy raises an exception before completing the request
{route} (e.g. /user/info, /key/info)INTERNALproxy rootManagement-endpoint calls (non-LLM proxy routes)
auth, router, self, proxy_pre_call, redis, postgres, batch_write_to_db, reset_budget_job, pod_lock_managerINTERNALproxy rootService-hook spans โ€” see below

Service-hook spans (a.k.a. "infrastructure" spans)โ€‹

LiteLLM has a separate hook (async_service_success_hook / async_service_failure_hook) that records timing for internal subsystems like the router, auth checks, Redis, Postgres, and the proxy pre-call pipeline. When the OTEL integration is active and a parent span is in context, each of these hooks creates an INTERNAL child span.

The span name is the ServiceTypes enum value (auth, router, self, proxy_pre_call, redis, postgres, โ€ฆ). The full set is defined in litellm/types/services.py. self is the LiteLLM SDK itself (e.g. timing of make_openai_chat_completion_request); router may appear multiple times per request (once for async_get_available_deployment, once for the wrapping acompletion).

Each service-hook span carries:

AttributeValue
serviceThe service enum value (e.g. "router", "redis")
call_typeThe specific operation (e.g. "async_get_available_deployment", "acompletion", "add_litellm_data_to_request")
errorSet on failure spans only
(custom event_metadata)Whatever the caller attached

These spans are operational/infrastructure spans, not GenAI semantic spans. They are useful for SRE-level debugging (where is time being spent inside LiteLLM?) but they do not carry gen_ai.* attributes. If you only want AI-semantic spans in your backend, filter on the presence of gen_ai.system (or on span name).

There is currently no env var that disables individual service-hook spans. If you need them filtered, do it at the OTLP collector / backend layer (e.g. via a tail-based sampler that drops by name).

Why don't I see a litellm_request span?โ€‹

Behavior changed in v1.81.0. By default, USE_OTEL_LITELLM_REQUEST_SPAN=false and the proxy collapses the litellm_request span into the parent Received Proxy Server Request span โ€” its gen_ai.* attributes are set on the parent instead. This:

  • Avoids duplicating attributes across parent and child.
  • Reduces span count (and storage cost) by ~1 span per request.
  • Keeps the trace shallow when a parent context already exists.

To restore the older nested behavior โ€” where every LLM call gets its own litellm_request span as a child of the proxy root span โ€” set:

USE_OTEL_LITELLM_REQUEST_SPAN=true

This is the right setting if:

  • One HTTP request makes multiple litellm.completion calls โ€” under the default behavior the last call's attributes overwrite earlier ones on the shared parent.
  • You want a clean parent for raw_gen_ai_request and guardrail children that is not the HTTP request span.
  • Your backend's UI is built around AI-semantic span names like litellm_request.

This is not a regression; the change is intentional. The flag is read fresh on every request, so it can be flipped without restarting.

In semconv mode (OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental), the same USE_OTEL_LITELLM_REQUEST_SPAN gating still decides whether the LLM-call span is emitted; semconv mode only changes the span's name (to {operation} {model}), kind (to CLIENT), and child structure.

Context propagation (W3C traceparent)โ€‹

LiteLLM honors the W3C Trace Context header. If your client (or upstream gateway) sends a traceparent header, LiteLLM creates the Received Proxy Server Request span as a child of that external trace, so LiteLLM's spans appear inline inside whatever distributed trace your application already has.

Parent-context resolution order (highest priority first):

  1. Explicit litellm_parent_otel_span in the request metadata.
  2. Inbound traceparent HTTP header (extracted via TraceContextTextMapPropagator).
  3. The currently-active span in the OTEL global context (thread-local).
  4. None โ€” LiteLLM's span is the root.

To force every LiteLLM trace to be its own root regardless of inbound headers or active context, set OTEL_IGNORE_CONTEXT_PROPAGATION=true.

Running Multiple OpenTelemetry Handlersโ€‹

You can run more than one OpenTelemetry handler in the same process, for example a generic OTLP exporter alongside a backend-specific subclass. Set skip_set_global=True on every handler past the first so each one gets its own private TracerProvider, MeterProvider, and LoggerProvider. Spans, metrics, and log events then flow only through that handler's exporter.

import litellm
from litellm.integrations.opentelemetry import OpenTelemetry, OpenTelemetryConfig

# Primary handler. Claims the global TracerProvider.
primary = OpenTelemetry(config=OpenTelemetryConfig(
exporter="otlp_http",
endpoint="https://your-collector/v1/traces",
))

# Secondary handler. Has its own private providers.
secondary = OpenTelemetry(config=OpenTelemetryConfig(
exporter="otlp_http",
endpoint="https://second-collector/v1/traces",
skip_set_global=True,
))

litellm.callbacks = [primary, secondary]

Init order does not matter. Both handlers receive their own spans regardless of which is constructed first.

Cross-collector behavior (e.g. LangSmith + generic OTEL)โ€‹

When two OTEL-emitting integrations are active at once โ€” for instance a customized LangSmith OTEL handler plus the generic otel exporter โ€” both honor the same traceparent propagation rules and the same parent-resolution order described in Context propagation. As long as one handler uses skip_set_global=True, both will:

  • See the same trace_id for a given request.
  • Emit the same span hierarchy (Received Proxy Server Request โ†’ litellm_request if enabled โ†’ raw_gen_ai_request / guardrail).
  • Differ only in which exporter they ship spans to.

If a customized LangSmith OTEL handler is configured to mount litellm_request only when the request carries a traceparent (otherwise no-op), the generic OTEL handler still emits its full hierarchy. The two views remain readable independently because the span names and attributes are identical.

Capturing Message Contentโ€‹

LiteLLM uses the standard OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable to control whether prompts and completions are captured, and where:

# Do not capture message content
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=NO_CONTENT

# Capture content on span attributes only
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY

# Capture content on event attributes only
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=EVENT_ONLY

# Capture content on both spans and events
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_AND_EVENT

A boolean form is also accepted: true maps to EVENT_ONLY, false maps to NO_CONTENT.

Per-handler content policyโ€‹

When running multiple OpenTelemetry handlers, set capture_message_content on each OpenTelemetryConfig so the handlers can have different content policies. For example, send full prompts to a debugging backend while stripping content from a compliance-focused OTLP collector:

import litellm
from litellm.integrations.opentelemetry import OpenTelemetry, OpenTelemetryConfig

stripped = OpenTelemetry(config=OpenTelemetryConfig(
exporter="otlp_http",
endpoint="https://compliance-collector/v1/traces",
capture_message_content="NO_CONTENT",
))

verbose = OpenTelemetry(config=OpenTelemetryConfig(
exporter="otlp_http",
endpoint="https://debug-collector/v1/traces",
capture_message_content="SPAN_AND_EVENT",
skip_set_global=True,
))

litellm.callbacks = [stripped, verbose]

Resolution order (highest priority first):

  1. litellm.turn_off_message_logging=True forces NO_CONTENT (dynamic kill-switch; overrides everything below).
  2. OpenTelemetryConfig.capture_message_content (per-handler field, sampled at handler init).
  3. OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT env var (sampled at handler init).
  4. The legacy per-instance message_logging flag โ€” defaults to True, which maps to SPAN_AND_EVENT.

Opt-In to Latest GenAI Semantic Conventionsโ€‹

Set OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental to emit spans that follow the latest OpenTelemetry GenAI semantic conventions:

OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental

This changes the LLM-call span name, kind, and structure, suppresses the non-standard raw_gen_ai_request child span, adds the gen_ai.provider.name attribute alongside gen_ai.system, populates additional request and cache-token attributes when present, and consolidates the per-message events into a single gen_ai.client.inference.operation.details event. See the Spans Reference and Attributes Reference below for the per-row differences.

OpenTelemetryConfig.semconv_stability is the programmatic equivalent. The flag is comma-separable per the OTEL spec.

Redacting Messages, Response Content from OpenTelemetry Loggingโ€‹

Redact Messages and Responses from all OpenTelemetry Loggingโ€‹

Set litellm.turn_off_message_logging=True This will prevent the messages and responses from being logged to OpenTelemetry, but request metadata will still be logged.

Redact Messages and Responses from specific OpenTelemetry Loggingโ€‹

In the metadata typically passed for text completion or embedding calls you can set specific keys to mask the messages and responses for this call.

Setting mask_input to True will mask the input from being logged for this call

Setting mask_output to True will make the output from being logged for this call.

Be aware that if you are continuing an existing trace, and you set update_trace_keys to include either input or output and you set the corresponding mask_input or mask_output, then that trace will have its existing input and/or output replaced with a redacted message.

Troubleshootingโ€‹

I don't see a litellm_request spanโ€‹

Expected behavior under the v1.81.0+ default (USE_OTEL_LITELLM_REQUEST_SPAN=false): the proxy root span absorbs the LLM-call attributes and there is no separate litellm_request span. To restore nested spans, set USE_OTEL_LITELLM_REQUEST_SPAN=true. See Why don't I see a litellm_request span?.

If you're in semconv mode, the LLM-call span exists but is renamed to {operation} {model} (e.g. chat gpt-4) โ€” search by gen_ai.system rather than by the literal name litellm_request.

I only see infrastructure spans (router, auth, redis, proxy_pre_call)โ€‹

Those are service-hook spans. They're emitted alongside the AI-semantic spans (raw_gen_ai_request, guardrail, and litellm_request if enabled), not instead of them. If you genuinely don't see any gen_ai.* attributes anywhere in your trace:

  1. Verify litellm.callbacks (or litellm_settings.callbacks) includes "otel".
  2. Verify the request actually hit a /chat/completions (or other LLM) route โ€” management endpoints (/key/info, /user/info, โ€ฆ) won't have gen_ai.* attributes.
  3. Check whether litellm.turn_off_message_logging=true and/or mask_input/mask_output are set โ€” they suppress message and raw-provider attributes.
  4. Set USE_OTEL_LITELLM_REQUEST_SPAN=true so the LLM attributes land on a span named litellm_request instead of being co-mingled with HTTP request attributes on Received Proxy Server Request.

Trace LiteLLM Proxy user/key/org/team information on failed requestsโ€‹

LiteLLM emits metadata.user_api_key_* attributes (key hash, key alias, org ID, user ID, team ID) on both successful and failed requests. They appear on the litellm_request span when present, otherwise on Received Proxy Server Request.

Not seeing traces land on Integrationโ€‹

If you don't see traces landing on your integration, set OTEL_DEBUG="True" in your LiteLLM environment and try again.

export OTEL_DEBUG="True"

This will emit any logging issues to the console. Common causes:

  • OTEL_EXPORTER_OTLP_ENDPOINT points to an HTTPS endpoint but the protocol is grpc (or vice-versa).
  • OTEL_HEADERS is missing the auth header your backend expects.
  • A firewall/sidecar is dropping outbound OTLP traffic on 4317/4318.
  • For gRPC, grpcio isn't installed (uv add "litellm[grpc]").

Spans are getting truncated or droppedโ€‹

OTLP exporters batch spans. Very large gen_ai.input.messages/gen_ai.output.messages (e.g. multi-megabyte prompts) can exceed default OTLP attribute size limits at the collector. Either:

  • Move large payloads off-trace (set litellm.turn_off_message_logging=true and rely on Spend Logs / cold storage, referenced via metadata.cold_storage_object_key).
  • Raise the collector's max_attribute_value_length and OTLP receiver max_recv_msg_size_mib.

Configuration Referenceโ€‹

All flags below are read from environment variables unless noted. Boolean flags accept true/false (case-insensitive).

Exporter & resourceโ€‹

VariableDefaultPurpose
OTEL_EXPORTER (alias: OTEL_EXPORTER_OTLP_PROTOCOL)consoleExporter type. Common values: console, otlp_http, otlp_grpc, http/json, http/protobuf, grpc
OTEL_ENDPOINT (alias: OTEL_EXPORTER_OTLP_ENDPOINT)noneOTLP endpoint URL
OTEL_HEADERS (alias: OTEL_EXPORTER_OTLP_HEADERS)noneComma-separated key=value,key2=value2 header list
OTEL_SERVICE_NAMElitellmResource attribute service.name
OTEL_ENVIRONMENT_NAMEproductionResource attribute deployment.environment
OTEL_MODEL_IDOTEL_SERVICE_NAMEResource attribute model_id
OTEL_TRACER_NAMElitellmTracer name
LITELLM_METER_NAMElitellmMeter name (when metrics enabled)
LITELLM_LOGGER_NAMElitellmLogger name (when events enabled)
OTEL_LOGS_EXPORTERnoneLogs exporter (e.g. console) when events are enabled

Span / metric / event togglesโ€‹

VariableDefaultEffect
USE_OTEL_LITELLM_REQUEST_SPANfalseForce litellm_request to always be emitted as a child of the proxy root span. See Why don't I see a litellm_request span?
OTEL_SEMCONV_STABILITY_OPT_INunsetSet to gen_ai_latest_experimental to switch to the latest GenAI semantic conventions. Comma-separable per OTEL spec
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTunset โ†’ falls back to legacy message_logging (default True โ†’ SPAN_AND_EVENT)NO_CONTENT / SPAN_ONLY / EVENT_ONLY / SPAN_AND_EVENT. Boolean form accepted (trueโ†’EVENT_ONLY, falseโ†’NO_CONTENT)
LITELLM_OTEL_INTEGRATION_ENABLE_METRICSfalseEnable OTLP metrics (TTFT, TPOT, response duration, cost, token usage, operation duration)
LITELLM_OTEL_INTEGRATION_ENABLE_EVENTSfalseEnable OTLP semantic logs (gen_ai.content.prompt/gen_ai.content.completion, or gen_ai.client.inference.operation.details in semconv mode)
OTEL_IGNORE_CONTEXT_PROPAGATIONfalseIf true, ignore inbound traceparent headers and any active span โ€” every LiteLLM trace becomes its own root
OTEL_DEBUG / DEBUG_OTELfalsePrint exporter and span-creation diagnostics to stderr
litellm.turn_off_message_logging (Python global / litellm_settings.turn_off_message_logging)falseKill-switch for content capture. Suppresses llm.{provider}.* raw request/response, gen_ai.input.messages, gen_ai.output.messages, and gen_ai.content.* log events. Overrides per-handler capture_message_content

Per-request redaction (request metadata)โ€‹

Per-request keys you can pass in metadata to redact a single call without disabling logging globally.

KeyEffect
mask_inputWhen true, redacts the input messages on this request
mask_outputWhen true, redacts the output messages on this request
update_trace_keysControls which trace keys (input, output) get replaced when continuing an existing trace
generation_nameOverrides the raw_gen_ai_request span's name with this value

OpenTelemetryConfig programmatic equivalentsโ€‹

FieldDefaultPurpose
exporterconsoleSame as OTEL_EXPORTER
endpointnoneSame as OTEL_ENDPOINT
headersnoneSame as OTEL_HEADERS
enable_metricsfalseSame as LITELLM_OTEL_INTEGRATION_ENABLE_METRICS
enable_eventsfalseSame as LITELLM_OTEL_INTEGRATION_ENABLE_EVENTS
capture_message_contentenv varPer-handler override; same value space as OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
semconv_stabilityenv varSame as OTEL_SEMCONV_STABILITY_OPT_IN
skip_set_globalfalseDon't claim the process-global TracerProvider/MeterProvider/LoggerProvider
ignore_context_propagationfalseSame as OTEL_IGNORE_CONTEXT_PROPAGATION

Appendix: Spans, Metrics, and Attributes Referenceโ€‹

This appendix enumerates every span, metric, and AI-semantic attribute LiteLLM emits, including how each changes when semconv mode is enabled.

Spans Referenceโ€‹

The LLM-call span is the AI-semantic core. Its name, kind, and supporting child spans depend on whether semconv mode is active.

SpanKindDefault modeSemconv mode
Proxy request frameSERVERReceived Proxy Server RequestReceived Proxy Server Request (unchanged)
LLM-call spanINTERNAL (default) / CLIENT (semconv)litellm_request (only when USE_OTEL_LITELLM_REQUEST_SPAN=true; otherwise attributes land on the proxy frame span){operation} {model} (e.g. chat gpt-4, embeddings text-embedding-3-small); same USE_OTEL_LITELLM_REQUEST_SPAN gating as default mode
Raw provider payloadINTERNALraw_gen_ai_request (when message content capture is permitted)not emitted (data lives on the LLM-call span and the consolidated event)
Guardrail checkINTERNALone span per guardrail invocation, named per guardrailunchanged
Management endpointINTERNALone span per proxy admin call, named per endpointunchanged

Operation names emitted in semconv mode: chat (default), embeddings (when call type contains embedding), text_completion (when call type contains text_completion).

Events Referenceโ€‹

Events land on the LiteLLM-managed LoggerProvider when enable_events=True on the config.

EventDefault modeSemconv mode
Per-message promptgen_ai.content.prompt (one event per input message)replaced by the consolidated event
Per-choice completiongen_ai.content.completion (one event per choice)replaced by the consolidated event
Consolidated inference detailsnot emittedgen_ai.client.inference.operation.details (one event per call, carrying gen_ai.input.messages and gen_ai.output.messages arrays per the spec)

Metrics Referenceโ€‹

LiteLLM emits the following histograms when enable_metrics=True is set on the OpenTelemetryConfig. Metric names match the OTEL GenAI semantic conventions.

MetricUnitDescription
gen_ai.client.operation.durationsEnd-to-end operation duration including LiteLLM overhead.
gen_ai.client.token.usage{token}Token usage. Records two histograms per call (label gen_ai.token.type is "input" or "output").
gen_ai.client.token.costUSDComputed request cost.
gen_ai.client.response.time_to_first_tokensTime from request start to first streamed token (streaming requests only).
gen_ai.client.response.time_per_output_tokensAverage time per output token (generation time / completion tokens).
gen_ai.client.response.durationsLLM API generation time, excluding LiteLLM overhead.

Common labels on every histogram: gen_ai.operation.name, gen_ai.system, gen_ai.request.model, gen_ai.framework="litellm".

Common metric askMetric
TTFTgen_ai.client.response.time_to_first_token
TPSderived as 1 / gen_ai.client.response.time_per_output_token
Token usagegen_ai.client.token.usage (split by gen_ai.token.type)
Vendor/model latency (excludes overhead)gen_ai.client.response.duration
Vendor/model latency (includes overhead)gen_ai.client.operation.duration

Spans โ†’ Derived Metricsโ€‹

Even with metrics off, every metric below can be derived from spans. This is what most dashboards do.

MetricHow to derive from spans
TTFT (Time to First Token)Streaming requests only. Use the dedicated gen_ai.client.response.time_to_first_token metric, or capture completion_start_time from request kwargs via a custom callback.
TPOT (Time per Output Token)Use the gen_ai.client.response.time_per_output_token metric, or derive as gen_ai.client.response.duration รท gen_ai.usage.output_tokens.
Total response durationgen_ai.client.response.duration metric, or end_time โˆ’ start_time of the LLM-call span (or proxy root span minus LiteLLM overhead โ€” see hidden_params.litellm_overhead_time_ms).
Vendor (provider) latencyDuration of the raw_gen_ai_request span (default mode) โ€” pure time spent waiting on the upstream provider. In semconv mode, use gen_ai.client.response.duration.
LiteLLM overheadhidden_params.litellm_overhead_time_ms on the proxy root span. Or Received Proxy Server Request.duration โˆ’ raw_gen_ai_request.duration.
Token usagegen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.total_tokens on the LLM span (or gen_ai.client.token.usage metric).
Costgen_ai.cost.total_cost (and the rest of gen_ai.cost.*) on the LLM span; or gen_ai.client.token.cost metric.
Guardrail evaluation timeDuration of each guardrail span. Disambiguate by guardrail_name and guardrail_mode.
Router / auth / Redis / DB latencyDuration of the corresponding service-hook span (router, auth, redis, postgres, โ€ฆ).
Retry / fallback counthidden_params.x-litellm-attempted-retries and hidden_params.x-litellm-attempted-fallbacks on the proxy root span.
Streaming?llm.is_streaming attribute ("True"/"False").

Attributes Referenceโ€‹

Attributes set on the LLM-call span. Names follow OTEL GenAI semconv.

AttributeDefault modeSemconv mode
gen_ai.operation.namethe litellm call_type (e.g. acompletion)the semconv operation (chat, embeddings, text_completion)
gen_ai.systemprovider name (e.g. openai)unchanged
gen_ai.provider.namenot setprovider name (the renamed Required attribute per spec)
gen_ai.frameworklitellmlitellm
gen_ai.request.modelmodelmodel
gen_ai.request.max_tokens, temperature, top_pwhen set in the requestwhen set in the request
gen_ai.request.frequency_penalty, presence_penalty, top_k, seed, stop_sequences, stream, choice.countnot setwhen set in the request
gen_ai.response.model, gen_ai.response.id, gen_ai.response.finish_reasonswhen present in the responseunchanged
gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.total_tokenswhen presentunchanged
gen_ai.usage.cache_creation.input_tokens, gen_ai.usage.cache_read.input_tokensnot setwhen present in the response
gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructionswhen message content capture permits, JSON-encoded array of {role, parts: [...]} objectsunchanged
gen_ai.cost.input_cost, output_cost, total_cost (and related cost breakdown attrs)LiteLLM-specific cost attributesunchanged

gen_ai.cost.* (cost breakdown, all modes)โ€‹

LiteLLM expands every key from standard_logging_payload["cost_breakdown"] as gen_ai.cost.{key}. Currently observed keys:

AttributeMeaning
gen_ai.cost.input_costPrompt token cost (USD)
gen_ai.cost.output_costCompletion token cost (USD)
gen_ai.cost.total_costCharged total (USD)
gen_ai.cost.tool_usage_costCost attributable to tool/function calls
gen_ai.cost.original_costPre-discount cost
gen_ai.cost.discount_percent, gen_ai.cost.discount_amountDiscount applied
gen_ai.cost.margin_percent, gen_ai.cost.margin_fixed_amount, gen_ai.cost.margin_total_amountMargin components

litellm.* (proxy root and LLM span)โ€‹

AttributeValue
litellm.call_idUnique per litellm.completion invocation. Use this to correlate trace data with LiteLLM Spend Logs and the LiteLLM UI
litellm.request.typeSame as call_type (e.g. acompletion, aembedding, aimage_generation)

llm.* (proxy root and LLM span)โ€‹

AttributeValue
llm.request.typeLiteLLM call_type
llm.is_streaming"True" or "False"
llm.useruser parameter, if set

llm.{provider}.* (raw provider request/response, default mode only)โ€‹

Set only on raw_gen_ai_request, to avoid attribute duplication. For each key in the raw provider request body, LiteLLM emits llm.{provider}.{key}. Same for the raw response body.

Examples observed for openai:

llm.openai.messages
llm.openai.model
llm.openai.temperature
llm.openai.max_tokens
llm.openai.id
llm.openai.object
llm.openai.created
llm.openai.choices
llm.openai.usage
llm.openai.system_fingerprint
llm.openai.service_tier
llm.openai.extra_body

For Anthropic, replace openai with anthropic (llm.anthropic.messages, llm.anthropic.stop_reason, etc.). Same pattern for every other provider.

These attributes are suppressed when OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=NO_CONTENT, when litellm.turn_off_message_logging=true, or in semconv mode (where the entire raw_gen_ai_request span is suppressed).

metadata.* (proxy root, sometimes LLM span)โ€‹

LiteLLM iterates standard_logging_payload["metadata"] and emits each entry as metadata.{key}. Common keys (not exhaustive):

AttributeMeaning
metadata.user_api_key_hashSHA hash of the virtual key used
metadata.user_api_key_aliasVirtual-key alias
metadata.user_api_key_team_id, metadata.user_api_key_team_aliasTeam identifiers
metadata.user_api_key_org_id, metadata.user_api_key_org_aliasOrganization identifiers
metadata.user_api_key_user_id, metadata.user_api_key_user_emailLiteLLM internal user identifiers
metadata.user_api_key_end_user_idEnd-user passed in request
metadata.user_api_key_project_id, metadata.user_api_key_project_aliasProject identifiers
metadata.user_api_key_spend, metadata.user_api_key_max_budget, metadata.user_api_key_budget_reset_atBudget state
metadata.user_api_key_request_routeRoute hit (e.g. /v1/chat/completions)
metadata.requester_ip_address, metadata.user_agentClient identifiers
metadata.requester_metadata, metadata.requester_custom_headersHeaders and request context
metadata.applied_guardrailsList of guardrails that ran on this request
metadata.mcp_tool_call_metadata, metadata.vector_store_request_metadataMCP and vector-store request info
metadata.usage_objectFull token-usage object
metadata.spend_logs_metadataCustom metadata persisted to Spend Logs
metadata.cold_storage_object_keyWhen request payloads are offloaded to cold storage
metadata.user_api_key_auth_metadataExtra auth context

Plus hidden_params โ€” a single attribute holding a JSON-serialized dict that includes litellm_overhead_time_ms, api_base, response_cost, additional_headers, model_id, x-litellm-attempted-retries, x-litellm-attempted-fallbacks, etc.

Guardrail span attributesโ€‹

Set on each guardrail child span:

AttributeValue
openinference.span.kind"guardrail" (per OpenInference convention)
guardrail_nameE.g. "presidio-pii", "lakera", "aporia"
guardrail_mode"pre_call", "during_call", "post_call", etc.
masked_entity_countIf the guardrail masked entities
guardrail_responseThe guardrail's response/action

The span's start_time/end_time come from the guardrail's own timing, so the span duration equals the guardrail evaluation time.

There is no separate guardrail_pre/guardrail_post span name today โ€” both are emitted as guardrail and disambiguated via the guardrail_mode attribute.

Service-hook span attributesโ€‹

See Service-hook spans. Each carries service, call_type, optional error, plus any custom event metadata the caller attached.

Exception attributesโ€‹

On Failed Proxy Server Request (and on any LLM-call span on failure):

AttributeValue
exceptionstr(original_exception)
Span statusStatusCode.ERROR

Resource attributes (every span)โ€‹

AttributeDefaultOverride
service.namelitellmOTEL_SERVICE_NAME
deployment.environmentproductionOTEL_ENVIRONMENT_NAME
model_idmatches service.nameOTEL_MODEL_ID
telemetry.sdk.{language,name,version}set by SDKโ€”

Stabilityโ€‹

Span names, metric names, and the attribute set above are stable across LiteLLM patch releases. The LLM-call span name and kind change between Default mode and Semconv mode and migrate via the documented opt-in flag rather than between releases.

Supportโ€‹

For LiteLLM OTEL integration questions, file an issue at BerriAI/litellm. For OpenLLMetry / Traceloop semantic-convention questions, see Slack or email dev@traceloop.com.