Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K

    Using the API

    OverviewClient SDKsBeta headersErrors
    Messages
    Create a Message
    Count tokens in a Message
    Batches
    Create a Message Batch
    Retrieve a Message Batch
    List Message Batches
    Cancel a Message Batch
    Delete a Message Batch
    Retrieve Message Batch results
    Models
    List Models
    Get a Model
    Beta
    Models
    List Models
    Get a Model
    Messages
    Create a Message
    Count tokens in a Message
    Batches
    Create a Message Batch
    Retrieve a Message Batch
    List Message Batches
    Cancel a Message Batch
    Delete a Message Batch
    Retrieve Message Batch results
    Files
    Upload File
    List Files
    Download File
    Get File Metadata
    Delete File
    Skills
    Create Skill
    List Skills
    Get Skill
    Delete Skill
    Versions
    Create Skill Version
    List Skill Versions
    Get Skill Version
    Delete Skill Version
    Admin
    Organization
    Get Organization Me
    Invites
    Create Invite
    List Invites
    Get Invite
    Delete Invite
    Users
    Get User
    Update User
    Remove User
    List Users
    Workspaces
    Get Workspace
    List Workspaces
    Create Workspace
    Update Workspace
    Archive Workspace
    Members
    Get Workspace Member
    Create Workspace Member
    Delete Workspace Member
    List Workspace Members
    Update Workspace Member
    API Keys
    Get Api Key
    Update Api Key
    List Api Keys
    Usage And Cost
    Usage Report
    Get Messages Usage Report
    Get Claude Code Usage Report
    Cost Report
    Get Cost Report
    Completions
    Create a Text Completion

    Support & configuration

    Rate limitsService tiersVersionsIP addressesSupported regionsOpenAI SDK compatibility
    Console
    Documentation

    Service tiers

    Different tiers of service allow you to balance availability, performance, and predictable costs based on your application's needs.

    We offer three service tiers:

    • Priority Tier: Best for workflows deployed in production where time, availability, and predictable pricing are important
    • Standard: Default tier for both piloting and scaling everyday use cases
    • Batch: Best for asynchronous workflows which can wait or benefit from being outside your normal capacity

    Standard Tier

    The standard tier is the default service tier for all API requests. Requests in this tier are prioritized alongside all other requests and observe best-effort availability.

    Priority Tier

    Requests in this tier are prioritized over all other requests to Anthropic. This prioritization helps minimize "server overloaded" errors, even during peak times.

    For more information, see Get started with Priority Tier

    How requests get assigned tiers

    When handling a request, Anthropic decides to assign a request to Priority Tier in the following scenarios:

    • Your organization has sufficient priority tier capacity input tokens per minute
    • Your organization has sufficient priority tier capacity output tokens per minute

    Anthropic counts usage against Priority Tier capacity as follows:

    Input Tokens

    • Cache reads as 0.1 tokens per token read from the cache
    • Cache writes as 1.25 tokens per token written to the cache with a 5 minute TTL
    • Cache writes as 2.00 tokens per token written to the cache with a 1 hour TTL
    • For long-context (>200k input tokens) requests, input tokens are 2 tokens per token
    • All other input tokens are 1 token per token

    Output Tokens

    • For long-context (>200k input tokens) requests, output tokens are 1.5 tokens per token
    • All other output tokens are 1 token per token

    Otherwise, requests proceed at standard tier.

    Requests assigned Priority Tier pull from both the Priority Tier capacity and the regular rate limits. If servicing the request would exceed the rate limits, the request is declined.

    Using service tiers

    You can control which service tiers can be used for a request by setting the service_tier parameter:

    message = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello, Claude!"}],
        service_tier="auto"  # Automatically use Priority Tier when available, fallback to standard
    )

    The service_tier parameter accepts the following values:

    • "auto" (default) - Uses the Priority Tier capacity if available, falling back to your other capacity if not
    • "standard_only" - Only use standard tier capacity, useful if you don't want to use your Priority Tier capacity

    The response usage object also includes the service tier assigned to the request:

    {
      "usage": {
        "input_tokens": 410,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0,
        "output_tokens": 585,
        "service_tier": "priority"
      }
    }

    This allows you to determine which service tier was assigned to the request.

    When requesting service_tier="auto" with a model with a Priority Tier commitment, these response headers provide insights:

    anthropic-priority-input-tokens-limit: 10000
    anthropic-priority-input-tokens-remaining: 9618
    anthropic-priority-input-tokens-reset: 2025-01-12T23:11:59Z
    anthropic-priority-output-tokens-limit: 10000
    anthropic-priority-output-tokens-remaining: 6000
    anthropic-priority-output-tokens-reset: 2025-01-12T23:12:21Z

    You can use the presence of these headers to detect if your request was eligible for Priority Tier, even if it was over the limit.

    Get started with Priority Tier

    You may want to commit to Priority Tier capacity if you are interested in:

    • Higher availability: Target 99.5% uptime with prioritized computational resources
    • Cost Control: Predictable spend and discounts for longer commitments
    • Flexible overflow: Automatically falls back to standard tier when you exceed your committed capacity

    Committing to Priority Tier will involve deciding:

    • A number of input tokens per minute
    • A number of output tokens per minute
    • A commitment duration (1, 3, 6, or 12 months)
    • A specific model version

    The ratio of input to output tokens you purchase matters. Sizing your Priority Tier capacity to align with your actual traffic patterns helps you maximize utilization of your purchased tokens.

    Supported models

    Priority Tier is supported by:

    • Claude Opus 4.1
    • Claude Opus 4
    • Claude Sonnet 4
    • Claude Sonnet 3.7
    • Claude Haiku 3.5

    Check the model overview page for more details on our models.

    How to access Priority Tier

    To begin using Priority Tier:

    1. Contact sales to complete provisioning
    2. (Optional) Update your API requests to optionally set the service_tier parameter to auto
    3. Monitor your usage through response headers and the Claude Console
    • Standard Tier
    • Priority Tier
    • How requests get assigned tiers
    • Using service tiers
    • Get started with Priority Tier
    • Supported models
    • How to access Priority Tier
    © 2025 ANTHROPIC PBC

    Products

    • Claude
    • Claude Code
    • Max plan
    • Team plan
    • Enterprise plan
    • Download app
    • Pricing
    • Log in

    Features

    • Claude and Slack
    • Claude in Excel

    Models

    • Opus
    • Sonnet
    • Haiku

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Claude Developer Platform

    • Overview
    • Developer docs
    • Pricing
    • Amazon Bedrock
    • Google Cloud’s Vertex AI
    • Console login

    Learn

    • Blog
    • Catalog
    • Courses
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Help and security

    • Availability
    • Status
    • Support center

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy

    Products

    • Claude
    • Claude Code
    • Max plan
    • Team plan
    • Enterprise plan
    • Download app
    • Pricing
    • Log in

    Features

    • Claude and Slack
    • Claude in Excel

    Models

    • Opus
    • Sonnet
    • Haiku

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Claude Developer Platform

    • Overview
    • Developer docs
    • Pricing
    • Amazon Bedrock
    • Google Cloud’s Vertex AI
    • Console login

    Learn

    • Blog
    • Catalog
    • Courses
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Help and security

    • Availability
    • Status
    • Support center

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    © 2025 ANTHROPIC PBC