Skip to main content

Vertex AI OCR

Overviewโ€‹

PropertyDetails
DescriptionVertex AI OCR provides document intelligence capabilities powered by Mistral, enabling text extraction from PDFs and images
Provider Route on LiteLLMvertex_ai/
Supported Operations/ocr
Link to Provider DocVertex AI โ†—

Extract text from documents and images using Vertex AI's OCR models, powered by Mistral.

Quick Startโ€‹

LiteLLM SDKโ€‹

SDK Usage
import litellm
import os

# Set environment variables
os.environ["VERTEXAI_PROJECT"] = "your-project-id"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

# OCR with PDF URL
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={
"type": "document_url",
"document_url": "https://example.com/document.pdf"
}
)

# Access extracted text
for page in response.pages:
print(page.text)

LiteLLM PROXYโ€‹

proxy_config.yaml
model_list:
- model_name: vertex-ocr
litellm_params:
model: vertex_ai/mistral-ocr-2505
vertex_project: os.environ/VERTEXAI_PROJECT
vertex_location: os.environ/VERTEXAI_LOCATION
vertex_credentials: path/to/service-account.json # Optional
model_info:
mode: ocr

Start Proxy

litellm --config proxy_config.yaml

Call OCR via Proxy

cURL Request
curl -X POST http://localhost:4000/ocr \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "vertex-ocr",
"document": {
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
}
}'

Authenticationโ€‹

Vertex AI OCR supports multiple authentication methods:

Service Account JSONโ€‹

Service Account Auth
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={"type": "document_url", "document_url": "https://..."},
vertex_project="your-project-id",
vertex_location="us-central1",
vertex_credentials="path/to/service-account.json"
)

Application Default Credentialsโ€‹

Default Credentials
# Relies on GOOGLE_APPLICATION_CREDENTIALS environment variable
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={"type": "document_url", "document_url": "https://..."},
vertex_project="your-project-id",
vertex_location="us-central1"
)

Document Typesโ€‹

Vertex AI OCR supports both PDFs and images.

PDF Documentsโ€‹

PDF OCR
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={
"type": "document_url",
"document_url": "https://example.com/document.pdf"
},
vertex_project="your-project-id",
vertex_location="us-central1"
)

Image Documentsโ€‹

Image OCR
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={
"type": "image_url",
"image_url": "https://example.com/image.png"
},
vertex_project="your-project-id",
vertex_location="us-central1"
)

Base64 Encoded Documentsโ€‹

Base64 PDF
import base64

# Read and encode PDF
with open("document.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode()

response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={
"type": "document_url",
"document_url": f"data:application/pdf;base64,{pdf_base64}"
},
vertex_project="your-project-id",
vertex_location="us-central1"
)

Supported Parametersโ€‹

All Parameters
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={ # Required: Document to process
"type": "document_url",
"document_url": "https://..."
},
vertex_project="your-project-id", # Required: GCP project ID
vertex_location="us-central1", # Optional: Defaults to us-central1
vertex_credentials="path/to/key.json", # Optional: Service account key
include_image_base64=True, # Optional: Include base64 images
pages=[0, 1, 2], # Optional: Specific pages to process
image_limit=10 # Optional: Limit number of images
)

Response Formatโ€‹

Response Structure
# Response has the following structure
response.pages # List of pages with extracted text
response.model # Model used
response.object # "ocr"
response.usage_info # Token usage information

# Access page content
for page in response.pages:
print(f"Page {page.page_number}:")
print(page.text)

Async Supportโ€‹

Async Usage
import litellm

response = await litellm.aocr(
model="vertex_ai/mistral-ocr-2505",
document={
"type": "document_url",
"document_url": "https://example.com/document.pdf"
},
vertex_project="your-project-id",
vertex_location="us-central1"
)

Cost Trackingโ€‹

LiteLLM automatically tracks costs for Vertex AI OCR:

  • Cost per page: $0.0005 (based on $1.50 per 1,000 pages)
View Cost
response = litellm.ocr(
model="vertex_ai/mistral-ocr-2505",
document={"type": "document_url", "document_url": "https://..."},
vertex_project="your-project-id"
)

# Access cost information
print(f"Cost: ${response._hidden_params.get('response_cost', 0)}")

Important Notesโ€‹

URL Conversion

Vertex AI OCR endpoints don't have internet access. LiteLLM automatically converts public URLs to base64 data URIs before sending requests to Vertex AI.

Regional Availability

Mistral OCR is available in multiple regions. Specify vertex_location to use a region closer to your data:

  • us-central1 (default)
  • europe-west1
  • asia-southeast1

Supported Modelsโ€‹

  • mistral-ocr-2505 - Latest Mistral OCR model on Vertex AI

Use the Vertex AI provider prefix: vertex_ai/<model-name>