We offer three service tiers:
The standard tier is the default service tier for all API requests. Requests in this tier are prioritized alongside all other requests and observe best-effort availability.
Requests in this tier are prioritized over all other requests to Anthropic. This prioritization helps minimize "server overloaded" errors, even during peak times.
For more information, see Get started with Priority Tier
When handling a request, Anthropic decides to assign a request to Priority Tier in the following scenarios:
Anthropic counts usage against Priority Tier capacity as follows:
Input Tokens
inference_geo: "us") requests, input tokens are 1.1 tokens per tokenOutput Tokens
inference_geo: "us") requests, output tokens are 1.1 tokens per tokenOtherwise, requests proceed at standard tier.
These burndown rates reflect the relative pricing of each token type. For example, US-only inference is priced at 1.1x, so each token consumed with inference_geo: "us" draws down 1.1 tokens from your Priority Tier capacity. Multipliers stack — a long-context request with US-only inference draws down input tokens at 2.2 tokens per token (2 × 1.1).
Requests assigned Priority Tier pull from both the Priority Tier capacity and the regular rate limits. If servicing the request would exceed the rate limits, the request is declined.
You can control which service tiers can be used for a request by setting the service_tier parameter:
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}],
service_tier="auto" # Automatically use Priority Tier when available, fallback to standard
)The service_tier parameter accepts the following values:
"auto" (default) - Uses the Priority Tier capacity if available, falling back to your other capacity if not"standard_only" - Only use standard tier capacity, useful if you don't want to use your Priority Tier capacityThe response usage object also includes the service tier assigned to the request:
{
"usage": {
"input_tokens": 410,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 585,
"service_tier": "priority"
}
}This allows you to determine which service tier was assigned to the request.
When requesting service_tier="auto" with a model with a Priority Tier commitment, these response headers provide insights:
anthropic-priority-input-tokens-limit: 10000
anthropic-priority-input-tokens-remaining: 9618
anthropic-priority-input-tokens-reset: 2025-01-12T23:11:59Z
anthropic-priority-output-tokens-limit: 10000
anthropic-priority-output-tokens-remaining: 6000
anthropic-priority-output-tokens-reset: 2025-01-12T23:12:21ZYou can use the presence of these headers to detect if your request was eligible for Priority Tier, even if it was over the limit.
You may want to commit to Priority Tier capacity if you are interested in:
Committing to Priority Tier will involve deciding:
The ratio of input to output tokens you purchase matters. Sizing your Priority Tier capacity to align with your actual traffic patterns helps you maximize utilization of your purchased tokens.
Priority Tier is supported by:
Check the model overview page for more details on our models.
To begin using Priority Tier:
service_tier parameter to autoWas this page helpful?