模型能力

快速模式（研究預覽）

為 Claude Opus 4.6 提供更高的輸出速度，為延遲敏感和代理工作流程提供顯著更快的 token 生成速度。

快速模式為 Claude Opus 4.6 提供顯著更快的輸出 token 生成速度。透過在 API 請求中設定 speed: "fast"，您可以從同一模型以高級定價獲得高達 2.5 倍的每秒輸出 token 數。

快速模式目前處於研究預覽階段。加入等候名單以申請存取權限。在我們收集回饋期間，可用性有限。

支援的模型

快速模式支援以下模型：

Claude Opus 4.6 (claude-opus-4-6)

快速模式的運作方式

快速模式使用更快的推論配置來運行相同的模型。智慧或功能沒有任何變化。

與標準速度相比，每秒輸出 token 數高達 2.5 倍
速度優勢集中在每秒輸出 token 數（OTPS），而非首個 token 時間（TTFT）
相同的模型權重和行為（不是不同的模型）

基本用法

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 4096,
        "speed": "fast",
        "messages": [{
            "role": "user",
            "content": "Refactor this module to use dependency injection"
        }]
    }'

定價

快速模式的定價為：提示 ≤200K token 時為標準 Opus 費率的 6 倍，提示 > 200K token 時為標準 Opus 費率的 12 倍。下表顯示了 Claude Opus 4.6 使用快速模式的定價：

上下文視窗	輸入	輸出
≤ 200K 輸入 token	$30 / MTok	$150 / MTok
> 200K 輸入 token	$60 / MTok	$225 / MTok

快速模式定價與其他定價修飾符疊加：

提示快取乘數在快速模式定價之上適用
資料駐留乘數在快速模式定價之上適用

如需完整定價詳情，請參閱定價頁面。

速率限制

快速模式有專用的速率限制，與標準 Opus 速率限制分開。與標準速度（對 ≤200K 和 >200K 輸入 token 有不同的限制）不同，快速模式使用單一速率限制，涵蓋完整的上下文範圍。當您的快速模式速率限制被超過時，API 會返回 429 錯誤，並附帶 retry-after 標頭，指示何時有可用容量。

回應包含指示您快速模式速率限制狀態的標頭：

標頭	描述
`anthropic-fast-input-tokens-limit`	每分鐘最大快速模式輸入 token 數
`anthropic-fast-input-tokens-remaining`	剩餘快速模式輸入 token 數
`anthropic-fast-input-tokens-reset`	快速模式輸入 token 限制重置時間
`anthropic-fast-output-tokens-limit`	每分鐘最大快速模式輸出 token 數
`anthropic-fast-output-tokens-remaining`	剩餘快速模式輸出 token 數
`anthropic-fast-output-tokens-reset`	快速模式輸出 token 限制重置時間

如需特定層級的速率限制，請參閱速率限制頁面。

檢查使用了哪種速度

回應的 usage 物件包含一個 speed 欄位，指示使用了哪種速度，"fast" 或 "standard"：

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 1024,
        "speed": "fast",
        "messages": [{"role": "user", "content": "Hello"}]
    }'

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  ...
  "usage": {
    "input_tokens": 523,
    "output_tokens": 1842,
    "speed": "fast"
  }
}

要追蹤整個組織的快速模式使用量和成本，請參閱使用量和成本 API。

重試和回退

自動重試

當快速模式速率限制被超過時，API 會返回 429 錯誤，並附帶 retry-after 標頭。Anthropic SDK 預設會自動重試這些請求最多 2 次（可透過 max_retries 配置），在每次重試前等待伺服器指定的延遲。由於快速模式使用持續的 token 補充，retry-after 延遲通常很短，一旦有可用容量，請求就會成功。

回退到標準速度

如果您希望回退到標準速度而不是等待快速模式容量，請捕獲速率限制錯誤並在不使用 speed: "fast" 的情況下重試。在初始快速請求上將 max_retries 設為 0，以跳過自動重試並在速率限制錯誤時立即失敗。

從快速回退到標準速度將導致提示快取未命中。不同速度的請求不共享快取前綴。

由於將 max_retries 設為 0 也會停用其他暫時性錯誤（過載、內部伺服器錯誤）的重試，以下範例會對這些情況使用預設重試重新發出原始請求。

import anthropic

client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
    try:
        return client.beta.messages.create(**params, max_retries=max_retries)
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(**params)
        raise
    except (
        anthropic.InternalServerError,
        anthropic.OverloadedError,
        anthropic.APIConnectionError,
    ):
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)

注意事項

提示快取：在快速和標準速度之間切換會使提示快取失效。不同速度的請求不共享快取前綴。
支援的模型：快速模式目前僅支援 Opus 4.6。對不支援的模型發送 speed: "fast" 會返回錯誤。
TTFT：快速模式的優勢集中在每秒輸出 token 數（OTPS），而非首個 token 時間（TTFT）。
Batch API：快速模式不適用於 Batch API。
Priority Tier：快速模式不適用於 Priority Tier。

後續步驟

定價

查看詳細的快速模式定價資訊。

速率限制

查看快速模式的速率限制層級。

Effort 參數

使用 effort 參數控制 token 使用量。

Was this page helpful?

模型能力

快速模式（研究預覽）

為 Claude Opus 4.6 提供更高的輸出速度，為延遲敏感和代理工作流程提供顯著更快的 token 生成速度。

快速模式目前處於研究預覽階段。加入等候名單以申請存取權限。在我們收集回饋期間，可用性有限。

支援的模型

快速模式支援以下模型：

Claude Opus 4.6 (claude-opus-4-6)

快速模式的運作方式

快速模式使用更快的推論配置來運行相同的模型。智慧或功能沒有任何變化。

與標準速度相比，每秒輸出 token 數高達 2.5 倍
速度優勢集中在每秒輸出 token 數（OTPS），而非首個 token 時間（TTFT）
相同的模型權重和行為（不是不同的模型）

基本用法

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 4096,
        "speed": "fast",
        "messages": [{
            "role": "user",
            "content": "Refactor this module to use dependency injection"
        }]
    }'

定價

上下文視窗	輸入	輸出
≤ 200K 輸入 token	$30 / MTok	$150 / MTok
> 200K 輸入 token	$60 / MTok	$225 / MTok

快速模式定價與其他定價修飾符疊加：

提示快取乘數在快速模式定價之上適用
資料駐留乘數在快速模式定價之上適用

如需完整定價詳情，請參閱定價頁面。

速率限制

回應包含指示您快速模式速率限制狀態的標頭：

標頭	描述
`anthropic-fast-input-tokens-limit`	每分鐘最大快速模式輸入 token 數
`anthropic-fast-input-tokens-remaining`	剩餘快速模式輸入 token 數
`anthropic-fast-input-tokens-reset`	快速模式輸入 token 限制重置時間
`anthropic-fast-output-tokens-limit`	每分鐘最大快速模式輸出 token 數
`anthropic-fast-output-tokens-remaining`	剩餘快速模式輸出 token 數
`anthropic-fast-output-tokens-reset`	快速模式輸出 token 限制重置時間

如需特定層級的速率限制，請參閱速率限制頁面。

檢查使用了哪種速度

回應的 usage 物件包含一個 speed 欄位，指示使用了哪種速度，"fast" 或 "standard"：

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-6",
        "max_tokens": 1024,
        "speed": "fast",
        "messages": [{"role": "user", "content": "Hello"}]
    }'

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  ...
  "usage": {
    "input_tokens": 523,
    "output_tokens": 1842,
    "speed": "fast"
  }
}

要追蹤整個組織的快速模式使用量和成本，請參閱使用量和成本 API。

重試和回退

自動重試

回退到標準速度

從快速回退到標準速度將導致提示快取未命中。不同速度的請求不共享快取前綴。

由於將 max_retries 設為 0 也會停用其他暫時性錯誤（過載、內部伺服器錯誤）的重試，以下範例會對這些情況使用預設重試重新發出原始請求。

import anthropic

client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
    try:
        return client.beta.messages.create(**params, max_retries=max_retries)
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(**params)
        raise
    except (
        anthropic.InternalServerError,
        anthropic.OverloadedError,
        anthropic.APIConnectionError,
    ):
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)

注意事項

提示快取：在快速和標準速度之間切換會使提示快取失效。不同速度的請求不共享快取前綴。
支援的模型：快速模式目前僅支援 Opus 4.6。對不支援的模型發送 speed: "fast" 會返回錯誤。
TTFT：快速模式的優勢集中在每秒輸出 token 數（OTPS），而非首個 token 時間（TTFT）。
Batch API：快速模式不適用於 Batch API。
Priority Tier：快速模式不適用於 Priority Tier。

後續步驟

定價

查看詳細的快速模式定價資訊。

速率限制

查看快速模式的速率限制層級。

Effort 參數

使用 effort 參數控制 token 使用量。

Was this page helpful?