使用延伸思考進行建構

延伸思考賦予 Claude 增強的推理能力，適用於複雜任務，同時在提供最終答案之前，提供不同程度的逐步思考過程透明度。

對於 Claude Opus 4.6，我們建議使用自適應思考（thinking: {type: "adaptive"}）搭配努力程度參數，而非本頁描述的手動思考模式。手動 thinking: {type: "enabled", budget_tokens: N} 配置在 Opus 4.6 上已棄用，並將在未來的模型版本中移除。

支援的模型

延伸思考在以下模型中受到支援：

Claude Opus 4.6（claude-opus-4-6）— 僅限自適應思考；手動模式（type: "enabled"）已棄用
Claude Opus 4.5（claude-opus-4-5-20251101）
Claude Opus 4.1（claude-opus-4-1-20250805）
Claude Opus 4（claude-opus-4-20250514）
Claude Sonnet 4.6（claude-sonnet-4-6）— 支援手動延伸思考搭配交錯模式以及自適應思考
Claude Sonnet 4.5（claude-sonnet-4-5-20250929）
Claude Sonnet 4（claude-sonnet-4-20250514）
Claude Sonnet 3.7（claude-3-7-sonnet-20250219）（已棄用）
Claude Haiku 4.5（claude-haiku-4-5-20251001）

API 行為在 Claude Sonnet 3.7 和 Claude 4 模型之間有所不同，但 API 格式保持完全相同。

如需更多資訊，請參閱不同模型版本的思考差異。

延伸思考的運作方式

當延伸思考開啟時，Claude 會建立 thinking 內容區塊，在其中輸出其內部推理過程。Claude 會在製作最終回應之前，整合這些推理中的洞察。

API 回應將包含 thinking 內容區塊，後接 text 內容區塊。

以下是預設回應格式的範例：

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

如需更多關於延伸思考回應格式的資訊，請參閱 Messages API 參考文件。

如何使用延伸思考

以下是在 Messages API 中使用延伸思考的範例：

要開啟延伸思考，請新增一個 thinking 物件，將 type 參數設為 enabled，並將 budget_tokens 設為延伸思考的指定 token 預算。對於 Claude Opus 4.6，我們建議改用 type: "adaptive" — 詳情請參閱自適應思考。雖然 type: "enabled" 搭配 budget_tokens 在 Opus 4.6 上仍然受支援，但已棄用並將在未來版本中移除。

budget_tokens 參數決定 Claude 在其內部推理過程中允許使用的最大 token 數量。在 Claude 4 及更新的模型中，此限制適用於完整的思考 token，而非摘要輸出。較大的預算可以透過對複雜問題進行更徹底的分析來提高回應品質，但 Claude 可能不會使用分配的全部預算，特別是在超過 32k 的範圍內。

budget_tokens 在 Claude Opus 4.6 上已棄用，並將在未來的模型版本中移除。我們建議使用自適應思考搭配努力程度參數來控制思考深度。

Claude Opus 4.6 支援最多 128K 輸出 token。較早的模型支援最多 64K 輸出 token。

budget_tokens 必須設為小於 max_tokens 的值。但是，當使用交錯思考搭配工具時，您可以超過此限制，因為 token 限制會變成您的整個上下文視窗（200k token）。

摘要思考

With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

Here are some important considerations for summarized thinking:

You're charged for the full thinking tokens generated by the original request, not the summary tokens.
The billed output token count will not match the count of tokens you see in the response.
The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

Claude Sonnet 3.7 continues to return full thinking output.

In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

串流思考

您可以使用伺服器推送事件（SSE）來串流延伸思考回應。

當為延伸思考啟用串流時，您會透過 thinking_delta 事件接收思考內容。

如需更多關於透過 Messages API 進行串流的文件，請參閱串流 Messages。

以下是如何處理帶有思考的串流：

Try in Console

串流輸出範例：

event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-sonnet-4-6", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "I need to find the GCD of 1071 and 462 using the Euclidean algorithm.\n\n1071 = 2 × 462 + 147"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n462 = 3 × 147 + 21\n147 = 7 × 21 + 0\n\nSo GCD(1071, 462) = 21"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "The greatest common divisor of 1071 and 462 is **21**."}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}

當使用啟用思考的串流時，您可能會注意到文字有時以較大的區塊到達，交替出現較小的逐 token 傳遞。這是預期的行為，特別是對於思考內容。

串流系統需要批次處理內容以獲得最佳效能，這可能導致這種「分塊」傳遞模式，串流事件之間可能會有延遲。我們持續努力改善此體驗，未來的更新將專注於使思考內容串流更加流暢。

延伸思考搭配工具使用

延伸思考可以與工具使用一起使用，讓 Claude 能夠推理工具選擇和結果處理。

當使用延伸思考搭配工具使用時，請注意以下限制：

工具選擇限制：帶有思考的工具使用僅支援 tool_choice: {"type": "auto"}（預設值）或 tool_choice: {"type": "none"}。使用 tool_choice: {"type": "any"} 或 tool_choice: {"type": "tool", "name": "..."} 將導致錯誤，因為這些選項會強制使用工具，這與延伸思考不相容。
保留思考區塊：在工具使用期間，您必須將 thinking 區塊傳回 API 以用於最後一個助手訊息。將完整未修改的區塊傳回 API 以維持推理連續性。

在對話中切換思考模式

您不能在助手回合中途切換思考，包括在工具使用迴圈期間。整個助手回合應在單一思考模式下運作：

如果思考已啟用，最後的助手回合應以思考區塊開始。
如果思考已停用，最後的助手回合不應包含任何思考區塊。

從模型的角度來看，工具使用迴圈是助手回合的一部分。助手回合在 Claude 完成其完整回應之前不會結束，這可能包括多次工具呼叫和結果。

例如，以下序列全部是單一助手回合的一部分：

User: "What's the weather in Paris?"
Assistant: [thinking] + [tool_use: get_weather]
User: [tool_result: "20°C, sunny"]
Assistant: [text: "The weather in Paris is 20°C and sunny"]

即使有多個 API 訊息，工具使用迴圈在概念上是一個連續助手回應的一部分。

思考優雅降級

當發生回合中途思考衝突時（例如在工具使用迴圈期間開啟或關閉思考），API 會自動為該請求停用思考。為了保持模型品質並維持在分佈範圍內，API 可能會：

當思考區塊會建立無效的回合結構時，從對話中移除思考區塊
當對話歷史與啟用思考不相容時，為當前請求停用思考

這意味著嘗試在回合中途切換思考不會導致錯誤，但思考會在該請求中被靜默停用。要確認思考是否處於活動狀態，請檢查回應中是否存在 thinking 區塊。

實用指南

最佳實踐：在每個回合開始時規劃您的思考策略，而不是嘗試在回合中途切換。

範例：在完成回合後切換思考

User: "What's the weather?"
Assistant: [tool_use] (thinking disabled)
User: [tool_result]
Assistant: [text: "It's sunny"]
User: "What about tomorrow?"
Assistant: [thinking] + [text: "..."] (thinking enabled - new turn)

透過在切換思考之前完成助手回合，您可以確保思考在新請求中確實被啟用。

切換思考模式也會使訊息歷史的提示快取失效。如需更多詳情，請參閱延伸思考搭配提示快取章節。

保留思考區塊

在工具使用期間，您必須將 thinking 區塊傳回 API，且必須將完整未修改的區塊傳回 API。這對於維持模型的推理流程和對話完整性至關重要。

雖然您可以省略先前 assistant 角色回合中的 thinking 區塊，但我們建議在任何多回合對話中始終將所有思考區塊傳回 API。API 將會：

自動過濾提供的思考區塊
使用保留模型推理所需的相關思考區塊
僅對顯示給 Claude 的區塊收取輸入 token 費用

在對話中切換思考模式時，請記住整個助手回合（包括工具使用迴圈）必須在單一思考模式下運作。如需更多詳情，請參閱在對話中切換思考模式。

當 Claude 呼叫工具時，它正在暫停其回應的建構以等待外部資訊。當工具結果返回時，Claude 將繼續建構該現有回應。這使得在工具使用期間保留思考區塊成為必要，原因有幾個：

推理連續性：思考區塊捕獲了 Claude 導致工具請求的逐步推理。當您發布工具結果時，包含原始思考可確保 Claude 能從中斷處繼續其推理。
上下文維護：雖然工具結果在 API 結構中顯示為使用者訊息，但它們是連續推理流程的一部分。保留思考區塊可在多次 API 呼叫中維持此概念流程。如需更多關於上下文管理的資訊，請參閱我們的上下文視窗指南。

重要：當提供 thinking 區塊時，整個連續 thinking 區塊的序列必須與模型在原始請求期間生成的輸出相符；您不能重新排列或修改這些區塊的序列。

交錯思考

Claude 4 模型中的延伸思考搭配工具使用支援交錯思考，這使 Claude 能夠在工具呼叫之間進行思考，並在收到工具結果後進行更精密的推理。

透過交錯思考，Claude 可以：

在決定下一步行動之前，對工具呼叫的結果進行推理
在推理步驟之間串聯多個工具呼叫
根據中間結果做出更細緻的決策

模型支援：

Claude Opus 4.6：使用自適應思考時會自動啟用交錯思考 — 不需要 beta 標頭。interleaved-thinking-2025-05-14 beta 標頭在 Opus 4.6 上已棄用，如果包含會被安全忽略。
Claude Sonnet 4.6：支援 interleaved-thinking-2025-05-14 beta 標頭搭配手動延伸思考（thinking: {type: "enabled"}）。您也可以使用自適應思考，它會自動啟用交錯思考。
其他 Claude 4 模型（Opus 4.5、Opus 4.1、Opus 4、Sonnet 4.5、Sonnet 4）：在您的 API 請求中新增 beta 標頭 interleaved-thinking-2025-05-14 以啟用交錯思考。

以下是交錯思考的一些重要注意事項：

使用交錯思考時，budget_tokens 可以超過 max_tokens 參數，因為它代表單一助手回合中所有思考區塊的總預算。
交錯思考僅支援透過 Messages API 使用的工具。
直接呼叫 Claude API 允許您在對任何模型的請求中傳遞 interleaved-thinking-2025-05-14，不會產生任何效果（Opus 4.6 除外，在該模型上已棄用並被安全忽略）。
在第三方平台上（例如 Amazon Bedrock 和 Vertex AI），如果您將 interleaved-thinking-2025-05-14 傳遞給 Claude Sonnet 4.6、Claude Opus 4.5、Claude Opus 4.1、Opus 4、Sonnet 4.5 或 Sonnet 4 以外的任何模型，您的請求將會失敗。

延伸思考搭配提示快取

提示快取搭配思考有幾個重要的注意事項：

延伸思考任務通常需要超過 5 分鐘才能完成。考慮使用 1 小時快取持續時間來在較長的思考會話和多步驟工作流程中維持快取命中。

思考區塊上下文移除

先前回合的思考區塊會從上下文中移除，這可能影響快取斷點
當使用工具繼續對話時，思考區塊會被快取，並在從快取讀取時計為輸入 token
這產生了一個權衡：雖然思考區塊在視覺上不佔用上下文視窗空間，但在快取時它們仍然計入您的輸入 token 使用量
如果思考被停用且您在當前工具使用回合中傳遞了思考內容，思考內容將被移除，且思考在該請求中將保持停用狀態

快取失效模式

思考參數的變更（啟用/停用或預算分配）會使訊息快取斷點失效
交錯思考會放大快取失效，因為思考區塊可能出現在多個工具呼叫之間
系統提示和工具在思考參數變更或區塊移除的情況下仍保持快取

雖然思考區塊在快取和上下文計算中被移除，但在使用工具使用繼續對話時必須保留它們，特別是在使用交錯思考時。

理解思考區塊的快取行為

當使用延伸思考搭配工具使用時，思考區塊會展現特定的快取行為，影響 token 計數：

運作方式：

快取僅在您發出包含工具結果的後續請求時才會發生
當發出後續請求時，先前的對話歷史（包括思考區塊）可以被快取
這些被快取的思考區塊在從快取讀取時，會計入您使用指標中的輸入 token
當包含非工具結果的使用者區塊時，所有先前的思考區塊會被忽略並從上下文中移除

詳細範例流程：

請求 1：

User: "What's the weather in Paris?"

回應 1：

[thinking_block_1] + [tool_use block 1]

請求 2：

User: ["What's the weather in Paris?"],
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True]

回應 2：

[thinking_block_2] + [text block 2]

請求 2 會寫入請求內容的快取（而非回應）。快取包括原始使用者訊息、第一個思考區塊、工具使用區塊和工具結果。

請求 3：

User: ["What's the weather in Paris?"],
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True],
Assistant: [thinking_block_2] + [text block 2],
User: [Text response, cache=True]

對於 Claude Opus 4.5 及更新版本（包括 Claude Opus 4.6），所有先前的思考區塊預設會被保留。對於較舊的模型，因為包含了非工具結果的使用者區塊，所有先前的思考區塊會被忽略。此請求的處理方式與以下相同：

User: ["What's the weather in Paris?"],
Assistant: [tool_use block 1],
User: [tool_result_1, cache=True],
Assistant: [text block 2],
User: [Text response, cache=True]

重點：

此快取行為會自動發生，即使沒有明確的 cache_control 標記
無論使用一般思考或交錯思考，此行為都是一致的

延伸思考的最大 token 數和上下文視窗大小

在較舊的 Claude 模型中（Claude Sonnet 3.7 之前），如果提示 token 和 max_tokens 的總和超過模型的上下文視窗，系統會自動調整 max_tokens 以適應上下文限制。這意味著您可以設定較大的 max_tokens 值，系統會在需要時靜默地減少它。

在 Claude 3.7 和 4 模型中，max_tokens（啟用思考時包含您的思考預算）被強制作為嚴格限制。如果提示 token + max_tokens 超過上下文視窗大小，系統現在會返回驗證錯誤。

您可以閱讀我們的上下文視窗指南以獲得更深入的了解。

延伸思考的上下文視窗

在計算啟用思考時的上下文視窗使用量時，有一些需要注意的事項：

先前輪次的思考區塊會被移除，不計入您的上下文視窗
當前輪次的思考計入該輪次的 max_tokens 限制

下圖展示了啟用延伸思考時的專門 token 管理：

啟用延伸思考的上下文視窗圖

有效上下文視窗的計算方式為：

context window =
  (current input tokens - previous thinking tokens) +
  (thinking tokens + encrypted thinking tokens + text output tokens)

我們建議使用 token 計數 API 來獲取您特定使用案例的準確 token 計數，特別是在處理包含思考的多輪對話時。

延伸思考搭配工具使用的上下文視窗

當使用延伸思考搭配工具使用時，思考區塊必須被明確保留並與工具結果一起返回。

延伸思考搭配工具使用的有效上下文視窗計算變為：

context window =
  (current input tokens + previous thinking tokens + tool use tokens) +
  (thinking tokens + encrypted thinking tokens + text output tokens)

下圖說明了延伸思考搭配工具使用的 token 管理：

延伸思考搭配工具使用的上下文視窗圖

管理延伸思考的 token

鑑於延伸思考 Claude 3.7 和 4 模型的上下文視窗和 max_tokens 行為，您可能需要：

更積極地監控和管理您的 token 使用量
隨著提示長度的變化調整 max_tokens 值
可能更頻繁地使用 token 計數端點
注意先前的思考區塊不會累積在您的上下文視窗中

此變更旨在提供更可預測和透明的行為，特別是在最大 token 限制已顯著增加的情況下。

思考加密

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.

If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

Here are some important considerations on thinking encryption:

When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
signature values are significantly longer in Claude 4 models than in previous models.
The signature field is an opaque field and should not be interpreted or parsed - it exists solely for verification purposes.
signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

思考編輯

Occasionally Claude's internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted_thinking block. redacted_thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.

When building customer-facing applications that use extended thinking:

Be aware that redacted thinking blocks contain encrypted content that isn't human-readable
Consider providing a simple explanation like: "Some of Claude's internal reasoning has been automatically encrypted for safety reasons. This doesn't affect the quality of responses."
If showing thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks
Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted
Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI

Here's an example showing both normal and redacted thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.

If you need to test redacted thinking handling in your application, you can use this special test string as your prompt: ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

When passing thinking and redacted_thinking blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn. This is critical for maintaining the model's reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the Preserving thinking blocks section.

不同模型版本的思考差異

Messages API 在 Claude Sonnet 3.7 和 Claude 4 模型之間處理思考的方式不同，主要在編輯和摘要行為方面。

請參閱下表的精簡比較：

功能	Claude Sonnet 3.7	Claude 4 模型（Opus 4.5 之前）	Claude Opus 4.5	Claude Sonnet 4.6	Claude Opus 4.6（自適應思考）
思考輸出	返回完整思考輸出	返回摘要思考	返回摘要思考	返回摘要思考	返回摘要思考
交錯思考	不支援	使用 `interleaved-thinking-2025-05-14` beta 標頭支援	使用 `interleaved-thinking-2025-05-14` beta 標頭支援	使用 `interleaved-thinking-2025-05-14` beta 標頭支援或透過自適應思考自動啟用	透過自適應思考自動啟用（不支援 beta 標頭）
思考區塊保留	不跨輪次保留	不跨輪次保留	預設保留

Claude Opus 4.5 及更新版本的思考區塊保留

從 Claude Opus 4.5 開始（並延續到 Claude Opus 4.6），先前助手輪次的思考區塊預設會保留在模型上下文中。這與較早的模型不同，較早的模型會從先前輪次中移除思考區塊。

思考區塊保留的好處：

快取最佳化：使用工具時，保留的思考區塊能夠實現快取命中，因為它們會與工具結果一起傳回，並在助手輪次中增量快取，從而在多步驟工作流程中節省 token
不影響智慧表現：保留思考區塊對模型效能沒有負面影響

重要注意事項：

上下文使用量：長對話會消耗更多上下文空間，因為思考區塊會保留在上下文中
自動行為：這是 Claude Opus 4.5 及更新模型（包括 Opus 4.6）的預設行為——不需要程式碼變更或 beta 標頭
向後相容性：要利用此功能，請繼續像工具使用一樣將完整、未修改的思考區塊傳回 API

對於較早的模型（Claude Sonnet 4.5、Opus 4.1 等），先前輪次的思考區塊仍會從上下文中移除。延伸思考搭配提示快取章節中描述的現有行為適用於這些模型。

定價

For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

The thinking process incurs charges for:

Tokens used during thinking (output tokens)
Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
Standard text output tokens

When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

When using summarized thinking:

Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
Output tokens (billed): The original thinking tokens that Claude generated internally
Output tokens (visible): The summarized thinking tokens you see in the response
No charge: Tokens used to generate the summary

The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

延伸思考的最佳實踐和注意事項

使用思考預算

**預算最佳化：**最低預算為 1,024 個 token。我們建議從最低值開始，逐步增加思考預算，以找到適合您使用案例的最佳範圍。較高的 token 數量能實現更全面的推理，但根據任務不同會有遞減效益。增加預算可以提高回應品質，但代價是增加延遲。對於關鍵任務，請測試不同設定以找到最佳平衡。請注意，思考預算是一個目標而非嚴格限制——實際 token 使用量可能因任務而異。
**起始點：**對於複雜任務，從較大的思考預算（16k+ token）開始，並根據您的需求進行調整。
**大預算：**對於超過 32k 的思考預算，我們建議使用批次處理以避免網路問題。推動模型思考超過 32k token 的請求會導致長時間執行的請求，可能會遇到系統超時和開放連線限制。
**Token 使用量追蹤：**監控思考 token 使用量以最佳化成本和效能。

效能考量

**回應時間：**由於推理過程需要額外處理，請準備好可能更長的回應時間。考慮到生成思考區塊可能會增加整體回應時間。
**串流要求：**當 max_tokens 大於 21,333 時，SDK 要求使用串流以避免長時間執行請求的 HTTP 超時。這是客戶端驗證，而非 API 限制。如果您不需要增量處理事件，請使用 .stream() 搭配 .get_final_message()（Python）或 .finalMessage()（TypeScript）來獲取完整的 Message 物件，而無需處理個別事件——詳見串流訊息。使用串流時，請準備好處理思考和文字內容區塊的到達。

功能相容性

思考與 temperature 或 top_k 修改以及強制工具使用不相容。
啟用思考時，您可以將 top_p 設定為 1 到 0.95 之間的值。
啟用思考時，您無法預填回應。
思考預算的變更會使包含訊息的已快取提示前綴失效。但是，當思考參數改變時，已快取的系統提示詞和工具定義將繼續運作。

使用指南

**任務選擇：**將延伸思考用於特別複雜的任務，這些任務受益於逐步推理，如數學、程式設計和分析。
**上下文處理：**您不需要自行移除先前的思考區塊。Claude API 會自動忽略先前輪次的思考區塊，且在計算上下文使用量時不會包含它們。
**提示工程：**如果您想最大化 Claude 的思考能力，請查看我們的延伸思考提示技巧。

後續步驟

試用延伸思考 cookbook

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-sonnet-4-6",
    "max_tokens": 16000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000
    },
    "messages": [
        {
            "role": "user",
            "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
        }
    ]
}'

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-sonnet-4-6",
    "max_tokens": 16000,
    "stream": true,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000
    },
    "messages": [
        {
            "role": "user",
            "content": "What is the greatest common divisor of 1071 and 462?"
        }
    ]
}'

支援的模型

延伸思考的運作方式

如何使用延伸思考

摘要思考

串流思考

延伸思考搭配工具使用

在對話中切換思考模式

思考優雅降級

實用指南

範例：傳遞思考區塊與工具結果

保留思考區塊

交錯思考

不使用交錯思考的工具使用

使用交錯思考的工具使用

延伸思考搭配提示快取

理解思考區塊的快取行為

系統提示詞快取（當思考改變時仍保留）

延伸思考的最大 token 數和上下文視窗大小

延伸思考的上下文視窗

延伸思考搭配工具使用的上下文視窗

管理延伸思考的 token

思考加密

思考編輯

範例：處理被編輯的思考區塊

不同模型版本的思考差異

Claude Opus 4.5 及更新版本的思考區塊保留

定價

延伸思考的最佳實踐和注意事項

使用思考預算

效能考量

功能相容性

使用指南

後續步驟

訊息快取（當思考改變時失效）