Was this page helpful?
This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
自适应思考是在 Claude Opus 4.7、Claude Opus 4.6 和 Claude Sonnet 4.6 上使用扩展思考的推荐方式,也是 Claude Mythos Preview 上的默认模式(当 thinking 未设置时自动应用)。自适应思考不需要手动设置思考令牌预算,而是让 Claude 根据每个请求的复杂性动态确定何时以及如何使用扩展思考。在 Claude Opus 4.7 上,自适应思考是唯一支持的思考模式;不再接受手动 thinking: {type: "enabled", budget_tokens: N}。
对于许多工作负载,特别是双峰任务和长期代理工作流,自适应思考可以提供比具有固定 budget_tokens 的扩展思考更好的性能。不需要 beta 标头。
如果您的工作负载需要可预测的延迟或对思考成本的精确控制,扩展思考与 budget_tokens 在 Claude Opus 4.6 和 Claude Sonnet 4.6 上仍然可用,但已弃用,不再推荐。请参阅下面的警告。
自适应思考在以下模型上受支持:
claude-mythos-preview),自适应思考是默认值;不支持 thinking: {type: "disabled"}claude-opus-4-7),自适应思考是唯一支持的思考模式。除非您在请求中明确设置 thinking: {type: "adaptive"},否则思考处于关闭状态;手动 thinking: {type: "enabled"} 会被拒绝并返回 400 错误。claude-opus-4-6)claude-sonnet-4-6)thinking.type: "enabled" 和 budget_tokens 在 Opus 4.6 和 Sonnet 4.6 上已弃用,将在未来的模型版本中删除。改用 thinking.type: "adaptive" 和 effort 参数。现有的 budget_tokens 配置仍然可用,但不再推荐;请计划迁移。
较旧的模型(Sonnet 4.5、Opus 4.5 等)不支持自适应思考,需要 thinking.type: "enabled" 和 budget_tokens。
在自适应模式中,思考对模型是可选的。Claude 评估每个请求的复杂性,并确定是否以及如何使用扩展思考。在默认努力级别(high)下,Claude 几乎总是会思考。在较低的努力级别下,Claude 可能会跳过对更简单问题的思考。
自适应思考还自动启用交错思考。这意味着 Claude 可以在工具调用之间进行思考,使其对代理工作流特别有效。
在您的 API 请求中将 thinking.type 设置为 "adaptive":
您可以将自适应思考与努力参数结合使用,以指导 Claude 进行多少思考。努力级别作为 Claude 思考分配的软指导:
| 努力级别 | 思考行为 |
|---|---|
max | Claude 总是思考,对思考深度没有限制。在 Claude Mythos Preview、Claude Opus 4.7、Claude Opus 4.6 和 Claude Sonnet 4.6 上可用。 |
xhigh | Claude 总是深入思考,进行扩展探索。在 Claude Opus 4.7 上可用。 |
high(默认) | Claude 总是思考。对复杂任务提供深度推理。 |
medium | Claude 使用适度思考。可能会跳过对非常简单查询的思考。 |
low | Claude 最小化思考。跳过对速度最重要的简单任务的思考。 |
自适应思考与流式传输无缝协作。思考块通过 thinking_delta 事件流式传输,就像手动思考模式一样:
| 模式 | 配置 | 可用性 | 何时使用 |
|---|---|---|---|
| 自适应 | thinking: {type: "adaptive"} | Claude Mythos Preview(默认)、Opus 4.7(唯一模式)、Opus 4.6、Sonnet 4.6 | Claude 确定何时以及如何使用扩展思考。使用 effort 进行指导。 |
| 手动 | thinking: {type: "enabled", budget_tokens: N} | 除 Claude Opus 4.7(被拒绝)外的所有模型。在 Opus 4.6 和 Sonnet 4.6 上已弃用(考虑改用自适应模式)。 | 当您需要对思考令牌支出进行精确控制时。 |
| 禁用 | 省略 thinking 参数或传递 {type: "disabled"} | 除 Claude Mythos Preview 外的所有模型 | 当您不需要扩展思考并希望获得最低延迟时。 |
自适应思考在 Claude Mythos Preview、Claude Opus 4.7、Opus 4.6 和 Sonnet 4.6 上可用。在 Mythos Preview 上,自适应思考是默认值,当 thinking 未设置时自动应用。在 Claude Opus 4.7 上,自适应思考是唯一支持的模式,type: "enabled" 和 budget_tokens 会被拒绝。较旧的模型仅支持 type: "enabled" 和 budget_tokens。在 Opus 4.6 和 Sonnet 4.6 上,type: "enabled" 和 budget_tokens 仍然可用但已弃用。
按模式的交错思考可用性:
interleaved-thinking-2025-05-14 beta 标头工作。使用自适应思考时,之前的助手回合不需要以思考块开头。这比手动模式更灵活,在手动模式中,API 强制要求启用思考的回合以思考块开头。
使用 adaptive 思考的连续请求保留提示缓存断点。但是,在 adaptive 和 enabled/disabled 思考模式之间切换会破坏消息的缓存断点。系统提示和工具定义无论模式如何更改都保持缓存。
自适应思考的触发行为是可提示的。如果 Claude 思考的频率比您希望的多或少,您可以向系统提示添加指导:
Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.引导 Claude 减少思考频率可能会降低受益于推理的任务的质量。在将基于提示的调整部署到生产环境之前,请测量对您特定工作负载的影响。考虑先测试较低的努力级别。
使用 max_tokens 作为总输出(思考 + 响应文本)的硬限制。effort 参数提供关于 Claude 分配多少思考的额外软指导。这两者结合起来可以有效控制成本。
在 high 和 max 努力级别下,Claude 可能会进行更广泛的思考,更可能耗尽 max_tokens 预算。如果您在响应中观察到 stop_reason: "max_tokens",请考虑增加 max_tokens 以给模型更多空间,或降低努力级别。
以下概念适用于所有支持扩展思考的模型,无论您使用自适应模式还是手动模式。
With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse. This is the default behavior on Claude 4 models when the display field on the thinking configuration is unset or set to "summarized". On Claude Opus 4.7 and Claude Mythos Preview, display defaults to "omitted" instead, so you must set display: "summarized" explicitly to receive summarized thinking.
Here are some important considerations for summarized thinking:
In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.
The display field on the thinking configuration controls how thinking content is returned in API responses. It accepts two values:
"summarized": Thinking blocks contain summarized thinking text. See Summarized thinking for details. This is the default on Claude Opus 4.6, Claude Sonnet 4.6, and earlier Claude 4 models."omitted": Thinking blocks are returned with an empty thinking field. The signature field still carries the encrypted full thinking for multi-turn continuity (see Thinking encryption). This is the default on Claude Opus 4.7 and Claude Mythos Preview.Setting display: "omitted" is useful when your application doesn't surface thinking content to users. The primary benefit is faster time-to-first-text-token when streaming: The server skips streaming thinking tokens entirely and delivers only the signature, so the final text response begins streaming sooner.
Here are some important considerations for omitted thinking:
signature to reconstruct the original thinking for prompt construction (see Preserving thinking blocks). Any text you place in the thinking field of a round-tripped omitted block is ignored.display is invalid with thinking.type: "disabled" (there is nothing to display).thinking.type: "adaptive" and the model skips thinking for a simple request, no thinking block is produced regardless of display.The signature field is identical whether display is "summarized" or "omitted". Switching display values between turns in a conversation is supported.
在 Claude Opus 4.7 上,thinking.display 默认为 "omitted"。思考块仍然出现在响应流中,但除非您明确选择加入,否则它们的 thinking 字段为空。这是从 Claude Opus 4.6 的默认值 "summarized" 的静默更改。要在 Claude Opus 4.7 上恢复总结思考文本,请明确将 thinking.display 设置为 "summarized":
thinking = {
"type": "adaptive",
"display": "summarized",
}有关 display: "omitted" 的代码示例和流式传输行为,请参阅扩展思考页面上的控制思考显示。那里的示例使用 type: "enabled";对于自适应思考,请使用:
thinking = {"type": "adaptive", "display": "omitted"}Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.
It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns. If you pass them back, whether the API keeps or strips them depends on the model: Opus 4.5+ and Sonnet 4.6+ keep them in context by default; earlier Opus/Sonnet models and all Haiku models strip them. See context editing to configure this.
If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.
Here are some important considerations on thinking encryption:
signature_delta inside a content_block_delta event just before the content_block_stop event.signature values are significantly longer in Claude 4 models than in previous models.signature field is an opaque field and should not be interpreted or parsed.signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.
The thinking process incurs charges for:
When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.
When using summarized thinking:
When using display: "omitted":
thinking field is empty)The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the thinking content visible in the response.
扩展思考页面涵盖了几个主题的更多细节,包括特定于模式的代码示例:
tool_choice 的限制。adaptive 和 enabled/disabled 模式之间切换会破坏消息的缓存断点(系统提示和工具定义保持缓存)。max_tokens 和上下文窗口限制交互。client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "Explain why the sum of two even numbers is always even.",
}
],
)
for block in response.content:
if block.type == "thinking":
print(f"\nThinking: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "medium"},
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.content[0].text)client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "What is the greatest common divisor of 1071 and 462?",
}
],
) as stream:
for event in stream:
if event.type == "content_block_start":
print(f"\nStarting {event.content_block.type} block...")
elif event.type == "content_block_delta":
if event.delta.type == "thinking_delta":
print(event.delta.thinking, end="", flush=True)
elif event.delta.type == "text_delta":
print(event.delta.text, end="", flush=True)了解更多关于扩展思考的信息,包括手动模式、工具使用和提示缓存。