Vertex AI

MessagesClaude on cloud platforms

Claude on Vertex AI

Anthropic's Claude models are available through Vertex AI.

The Vertex API for accessing Claude is nearly-identical to the Messages API and supports all of the same options, with two key differences:

In Vertex, model is not passed in the request body. Instead, it is specified in the Google Cloud endpoint URL.
In Vertex, anthropic_version is passed in the request body (rather than as a header), and must be set to the value vertex-2023-10-16.

Vertex is also supported by Anthropic's official client SDKs. This guide walks you through making a request to Claude on Vertex AI using one of Anthropic's client SDKs.

Note that this guide assumes you already have a GCP project that is able to use Vertex AI. See Anthropic Claude models on Vertex AI for more information on the setup required and a full walkthrough.

Install an SDK for accessing Vertex AI

First, install Anthropic's client SDK for your language of choice.

Accessing Vertex AI

Model availability

Note that Anthropic model availability varies by region. Search for "Claude" in the Vertex AI Model Garden or go to Anthropic Claude models for the latest information.

API model IDs

Model	Vertex AI API model ID
Claude Opus 4.7	claude-opus-4-7
Claude Opus 4.6	claude-opus-4-6
Claude Sonnet 4.6	claude-sonnet-4-6
Claude Sonnet 4.5	claude-sonnet-4-5@20250929
Claude Sonnet 4 ⚠️	claude-sonnet-4@20250514
Claude Sonnet 3.7 ⚠️	claude-3-7-sonnet@20250219
Claude Opus 4.5	claude-opus-4-5@20251101
Claude Opus 4.1	claude-opus-4-1@20250805
Claude Opus 4 ⚠️	claude-opus-4@20250514
Claude Haiku 4.5	claude-haiku-4-5@20251001
Claude Haiku 3.5 ⚠️	claude-3-5-haiku@20241022

Making requests

Before running requests you may need to run gcloud auth application-default login to authenticate with GCP.

The following examples show how to generate text from Claude on Vertex AI:

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "global"

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

See the client SDKs and the official Vertex AI docs for more details.

Claude is also available through Amazon Bedrock, Claude Platform on AWS, and Microsoft Foundry.

Activity logging

Vertex provides a request-response logging service that allows customers to log the prompts and completions associated with your usage.

Anthropic recommends that you log your activity on at least a 30-day rolling basis in order to understand your activity and investigate any potential misuse.

Turning on this service does not give Google or Anthropic any access to your content.

Feature support

For all currently supported features on Vertex AI, see API features overview.

Context window

Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6 have a 1M-token context window on Vertex AI. Other Claude models, including Sonnet 4.5 and Sonnet 4 (deprecated), have a 200k-token context window.

Vertex AI limits request payloads to 30 MB. When sending large documents or many images, you may reach this limit before the token limit.

Global, multi-region, and regional endpoints

Vertex AI offers three endpoint types:

Global endpoints: Dynamic routing for maximum availability
Multi-region endpoints: Dynamic routing within a geographic area (for example, the United States or the European Union) for data residency with high availability
Regional endpoints: Guaranteed data routing through specific geographic regions

Regional and multi-region endpoints include a 10% pricing premium over global endpoints.

This applies to Claude Sonnet 4.5 and future models only. Older models (Claude Sonnet 4 (deprecated), Opus 4 (deprecated), and earlier) maintain their existing pricing structures.

When to use each option

Global endpoints (recommended):

Provide maximum availability and uptime
Dynamically route requests to regions with available capacity
No pricing premium
Best for applications where data residency is flexible
Only supports pay-as-you-go traffic (provisioned throughput requires regional endpoints)

Multi-region endpoints:

Dynamically route requests across regions within a geographic area (currently us and eu)
Useful when you need data residency within a broad geography but want higher availability than a single region
10% pricing premium over global endpoints
Only supports pay-as-you-go traffic (provisioned throughput requires regional endpoints)

Regional endpoints:

Route traffic through specific geographic regions
Required for single-region data residency, strict compliance mandates, or provisioned throughput
Support both pay-as-you-go and provisioned throughput
10% pricing premium reflects infrastructure costs for dedicated regional capacity

Implementation

Using global endpoints (recommended):

Set the region parameter to "global" when initializing the client:

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "global"

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

Using multi-region endpoints:

Set the region parameter to a multi-region identifier: "us" for the United States or "eu" for the European Union. The SDK routes requests to the corresponding multi-region endpoint (https://aiplatform.us.rep.googleapis.com or https://aiplatform.eu.rep.googleapis.com), which dynamically balances traffic across regions within that geography.

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "us"  # Multi-region identifier: "us" or "eu"

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

Using regional endpoints:

Specify a specific region like "us-east1" or "europe-west1":

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "us-east1"  # Specify a specific region

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

Claude Mythos Preview is a research preview available to invited customers on Vertex AI. For more information, see Project Glasswing.

Additional resources

Vertex AI pricing: cloud.google.com/vertex-ai/generative-ai/pricing
Claude models documentation: Claude on Vertex AI
Google blog post: Global endpoint for Claude models
Anthropic pricing details: Cloud platform pricing

Was this page helpful?

MessagesClaude on cloud platforms

Claude on Vertex AI

Anthropic's Claude models are available through Vertex AI.

The Vertex API for accessing Claude is nearly-identical to the Messages API and supports all of the same options, with two key differences:

In Vertex, model is not passed in the request body. Instead, it is specified in the Google Cloud endpoint URL.
In Vertex, anthropic_version is passed in the request body (rather than as a header), and must be set to the value vertex-2023-10-16.

Vertex is also supported by Anthropic's official client SDKs. This guide walks you through making a request to Claude on Vertex AI using one of Anthropic's client SDKs.

Note that this guide assumes you already have a GCP project that is able to use Vertex AI. See Anthropic Claude models on Vertex AI for more information on the setup required and a full walkthrough.

Install an SDK for accessing Vertex AI

First, install Anthropic's client SDK for your language of choice.

Accessing Vertex AI

Model availability

Note that Anthropic model availability varies by region. Search for "Claude" in the Vertex AI Model Garden or go to Anthropic Claude models for the latest information.

API model IDs

Model	Vertex AI API model ID
Claude Opus 4.7	claude-opus-4-7
Claude Opus 4.6	claude-opus-4-6
Claude Sonnet 4.6	claude-sonnet-4-6
Claude Sonnet 4.5	claude-sonnet-4-5@20250929
Claude Sonnet 4 ⚠️	claude-sonnet-4@20250514
Claude Sonnet 3.7 ⚠️	claude-3-7-sonnet@20250219
Claude Opus 4.5	claude-opus-4-5@20251101
Claude Opus 4.1	claude-opus-4-1@20250805
Claude Opus 4 ⚠️	claude-opus-4@20250514
Claude Haiku 4.5	claude-haiku-4-5@20251001
Claude Haiku 3.5 ⚠️	claude-3-5-haiku@20241022

Making requests

Before running requests you may need to run gcloud auth application-default login to authenticate with GCP.

The following examples show how to generate text from Claude on Vertex AI:

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "global"

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

See the client SDKs and the official Vertex AI docs for more details.

Claude is also available through Amazon Bedrock, Claude Platform on AWS, and Microsoft Foundry.

Activity logging

Vertex provides a request-response logging service that allows customers to log the prompts and completions associated with your usage.

Anthropic recommends that you log your activity on at least a 30-day rolling basis in order to understand your activity and investigate any potential misuse.

Turning on this service does not give Google or Anthropic any access to your content.

Feature support

For all currently supported features on Vertex AI, see API features overview.

Context window

Vertex AI limits request payloads to 30 MB. When sending large documents or many images, you may reach this limit before the token limit.

Global, multi-region, and regional endpoints

Vertex AI offers three endpoint types:

Global endpoints: Dynamic routing for maximum availability
Multi-region endpoints: Dynamic routing within a geographic area (for example, the United States or the European Union) for data residency with high availability
Regional endpoints: Guaranteed data routing through specific geographic regions

Regional and multi-region endpoints include a 10% pricing premium over global endpoints.

This applies to Claude Sonnet 4.5 and future models only. Older models (Claude Sonnet 4 (deprecated), Opus 4 (deprecated), and earlier) maintain their existing pricing structures.

When to use each option

Global endpoints (recommended):

Provide maximum availability and uptime
Dynamically route requests to regions with available capacity
No pricing premium
Best for applications where data residency is flexible
Only supports pay-as-you-go traffic (provisioned throughput requires regional endpoints)

Multi-region endpoints:

Dynamically route requests across regions within a geographic area (currently us and eu)
Useful when you need data residency within a broad geography but want higher availability than a single region
10% pricing premium over global endpoints
Only supports pay-as-you-go traffic (provisioned throughput requires regional endpoints)

Regional endpoints:

Route traffic through specific geographic regions
Required for single-region data residency, strict compliance mandates, or provisioned throughput
Support both pay-as-you-go and provisioned throughput
10% pricing premium reflects infrastructure costs for dedicated regional capacity

Implementation

Using global endpoints (recommended):

Set the region parameter to "global" when initializing the client:

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "global"

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

Using multi-region endpoints:

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "us"  # Multi-region identifier: "us" or "eu"

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

Using regional endpoints:

Specify a specific region like "us-east1" or "europe-west1":

from anthropic import AnthropicVertex

project_id = "MY_PROJECT_ID"
region = "us-east1"  # Specify a specific region

client = AnthropicVertex(project_id=project_id, region=region)

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=100,
    messages=[
        {
            "role": "user",
            "content": "Hey Claude!",
        }
    ],
)
print(message)

Claude Mythos Preview is a research preview available to invited customers on Vertex AI. For more information, see Project Glasswing.

Additional resources

Vertex AI pricing: cloud.google.com/vertex-ai/generative-ai/pricing
Claude models documentation: Claude on Vertex AI
Google blog post: Global endpoint for Claude models
Anthropic pricing details: Cloud platform pricing

Was this page helpful?