模型能力

快速模式（研究预览）

Claude Opus 4.6 的更高输出速度，为延迟敏感型和智能体工作流提供显著更快的 token 生成速度。

快速模式为 Claude Opus 4.6 提供显著更快的输出 token 生成速度。通过在 API 请求中设置 speed: "fast"，您可以从同一模型获得高达 2.5 倍的每秒输出 token 数，采用高级定价。

快速模式目前处于研究预览阶段。加入等待列表以申请访问权限。在我们收集反馈期间，可用性有限。

支持的模型

快速模式支持以下模型：

Claude Opus 4.6 (claude-opus-4-6)

快速模式的工作原理

快速模式使用更快的推理配置运行同一模型。智能水平或功能没有任何变化。

与标准速度相比，每秒输出 token 数最高可提升 2.5 倍
速度优势集中在每秒输出 token 数（OTPS），而非首个 token 响应时间（TTFT）
相同的模型权重和行为（不是不同的模型）

基本用法

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 4096,
        "speed": "fast",
        "messages": [{
            "role": "user",
            "content": "Refactor this module to use dependency injection"
        }]
    }'

定价

快速模式的定价为：提示词 ≤200K token 时为标准 Opus 费率的 6 倍，提示词 > 200K token 时为标准 Opus 费率的 12 倍。下表显示了 Claude Opus 4.6 快速模式的定价：

上下文窗口	输入	输出
≤ 200K 输入 token	$30 / MTok	$150 / MTok
> 200K 输入 token	$60 / MTok	$225 / MTok

快速模式定价与其他定价修饰符叠加：

提示缓存乘数在快速模式定价基础上叠加
数据驻留乘数在快速模式定价基础上叠加

有关完整定价详情，请参阅定价页面。

速率限制

快速模式有专用的速率限制，与标准 Opus 速率限制分开。与标准速度对 ≤200K 和 >200K 输入 token 有单独限制不同，快速模式使用覆盖完整上下文范围的单一速率限制。当您的快速模式速率限制被超出时，API 返回 429 错误，并附带 retry-after 头部，指示何时有可用容量。

响应包含指示您快速模式速率限制状态的头部：

头部	描述
`anthropic-fast-input-tokens-limit`	每分钟最大快速模式输入 token 数
`anthropic-fast-input-tokens-remaining`	剩余快速模式输入 token 数
`anthropic-fast-input-tokens-reset`	快速模式输入 token 限制重置时间
`anthropic-fast-output-tokens-limit`	每分钟最大快速模式输出 token 数
`anthropic-fast-output-tokens-remaining`	剩余快速模式输出 token 数
`anthropic-fast-output-tokens-reset`	快速模式输出 token 限制重置时间

有关特定层级的速率限制，请参阅速率限制页面。

检查使用了哪种速度

响应的 usage 对象包含一个 speed 字段，指示使用了哪种速度，值为 "fast" 或 "standard"：

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 1024,
        "speed": "fast",
        "messages": [{"role": "user", "content": "Hello"}]
    }'

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  ...
  "usage": {
    "input_tokens": 523,
    "output_tokens": 1842,
    "speed": "fast"
  }
}

要跟踪整个组织的快速模式使用情况和成本，请参阅使用量和成本 API。

重试和回退

自动重试

当快速模式速率限制被超出时，API 返回 429 错误，并附带 retry-after 头部。Anthropic SDK 默认自动重试这些请求最多 2 次（可通过 max_retries 配置），在每次重试前等待服务器指定的延迟时间。由于快速模式使用持续的 token 补充机制，retry-after 延迟通常很短，请求在容量可用后即可成功。

回退到标准速度

如果您希望回退到标准速度而不是等待快速模式容量，请捕获速率限制错误并在不使用 speed: "fast" 的情况下重试。在初始快速请求上将 max_retries 设置为 0，以跳过自动重试并在速率限制错误时立即失败。

从快速模式回退到标准速度将导致提示缓存未命中。不同速度的请求不共享缓存前缀。

由于将 max_retries 设置为 0 也会禁用对其他瞬态错误（过载、内部服务器错误）的重试，以下示例会对这些情况使用默认重试重新发出原始请求。

import anthropic

client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
    try:
        return client.beta.messages.create(**params, max_retries=max_retries)
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(**params)
        raise
    except (
        anthropic.InternalServerError,
        anthropic.OverloadedError,
        anthropic.APIConnectionError,
    ):
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)

注意事项

提示缓存：在快速和标准速度之间切换会使提示缓存失效。不同速度的请求不共享缓存前缀。
支持的模型：快速模式目前仅支持 Opus 4.6。对不支持的模型发送 speed: "fast" 将返回错误。
TTFT：快速模式的优势集中在每秒输出 token 数（OTPS），而非首个 token 响应时间（TTFT）。
Batch API：快速模式不适用于 Batch API。
Priority Tier：快速模式不适用于 Priority Tier。

后续步骤

定价

查看详细的快速模式定价信息。

速率限制

查看快速模式的速率限制层级。

Effort 参数

使用 effort 参数控制 token 使用量。

Was this page helpful?

模型能力

快速模式（研究预览）

Claude Opus 4.6 的更高输出速度，为延迟敏感型和智能体工作流提供显著更快的 token 生成速度。

快速模式目前处于研究预览阶段。加入等待列表以申请访问权限。在我们收集反馈期间，可用性有限。

支持的模型

快速模式支持以下模型：

Claude Opus 4.6 (claude-opus-4-6)

快速模式的工作原理

快速模式使用更快的推理配置运行同一模型。智能水平或功能没有任何变化。

与标准速度相比，每秒输出 token 数最高可提升 2.5 倍
速度优势集中在每秒输出 token 数（OTPS），而非首个 token 响应时间（TTFT）
相同的模型权重和行为（不是不同的模型）

基本用法

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 4096,
        "speed": "fast",
        "messages": [{
            "role": "user",
            "content": "Refactor this module to use dependency injection"
        }]
    }'

定价

上下文窗口	输入	输出
≤ 200K 输入 token	$30 / MTok	$150 / MTok
> 200K 输入 token	$60 / MTok	$225 / MTok

快速模式定价与其他定价修饰符叠加：

提示缓存乘数在快速模式定价基础上叠加
数据驻留乘数在快速模式定价基础上叠加

有关完整定价详情，请参阅定价页面。

速率限制

响应包含指示您快速模式速率限制状态的头部：

头部	描述
`anthropic-fast-input-tokens-limit`	每分钟最大快速模式输入 token 数
`anthropic-fast-input-tokens-remaining`	剩余快速模式输入 token 数
`anthropic-fast-input-tokens-reset`	快速模式输入 token 限制重置时间
`anthropic-fast-output-tokens-limit`	每分钟最大快速模式输出 token 数
`anthropic-fast-output-tokens-remaining`	剩余快速模式输出 token 数
`anthropic-fast-output-tokens-reset`	快速模式输出 token 限制重置时间

有关特定层级的速率限制，请参阅速率限制页面。

检查使用了哪种速度

响应的 usage 对象包含一个 speed 字段，指示使用了哪种速度，值为 "fast" 或 "standard"：

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 1024,
        "speed": "fast",
        "messages": [{"role": "user", "content": "Hello"}]
    }'

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  ...
  "usage": {
    "input_tokens": 523,
    "output_tokens": 1842,
    "speed": "fast"
  }
}

要跟踪整个组织的快速模式使用情况和成本，请参阅使用量和成本 API。

重试和回退

自动重试

回退到标准速度

从快速模式回退到标准速度将导致提示缓存未命中。不同速度的请求不共享缓存前缀。

由于将 max_retries 设置为 0 也会禁用对其他瞬态错误（过载、内部服务器错误）的重试，以下示例会对这些情况使用默认重试重新发出原始请求。

import anthropic

client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
    try:
        return client.beta.messages.create(**params, max_retries=max_retries)
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(**params)
        raise
    except (
        anthropic.InternalServerError,
        anthropic.OverloadedError,
        anthropic.APIConnectionError,
    ):
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)

注意事项

提示缓存：在快速和标准速度之间切换会使提示缓存失效。不同速度的请求不共享缓存前缀。
支持的模型：快速模式目前仅支持 Opus 4.6。对不支持的模型发送 speed: "fast" 将返回错误。
TTFT：快速模式的优势集中在每秒输出 token 数（OTPS），而非首个 token 响应时间（TTFT）。
Batch API：快速模式不适用于 Batch API。
Priority Tier：快速模式不适用于 Priority Tier。

后续步骤

定价

查看详细的快速模式定价信息。

速率限制

查看快速模式的速率限制层级。

Effort 参数

使用 effort 参数控制 token 使用量。

Was this page helpful?