能力

自适应思考

让 Claude 通过自适应思考模式动态决定何时思考以及思考多少。

自适应思考是在 Claude Opus 4.6 上使用扩展思考的推荐方式。自适应思考不需要手动设置思考 token 预算，而是让 Claude 根据每个请求的复杂程度动态决定何时思考以及思考多少。

自适应思考比使用固定 budget_tokens 的扩展思考能更可靠地带来更好的性能，我们建议切换到自适应思考以从 Opus 4.6 获得最智能的响应。无需 beta 头部。

支持的模型

自适应思考在以下模型上受支持：

Claude Opus 4.6 (claude-opus-4-6)

thinking.type: "enabled" 和 budget_tokens 在 Opus 4.6 上已弃用，将在未来的模型版本中移除。请改用 thinking.type: "adaptive" 配合 effort 参数。

较旧的模型（Sonnet 4.5、Opus 4.5 等）不支持自适应思考，需要使用 thinking.type: "enabled" 配合 budget_tokens。

自适应思考的工作原理

在自适应模式下，思考对模型来说是可选的。Claude 会评估每个请求的复杂程度，并决定是否思考以及思考多少。在默认 effort 级别（high）下，Claude 几乎总是会思考。在较低的 effort 级别下，Claude 可能会跳过对较简单问题的思考。

自适应思考还会自动启用交错思考。这意味着 Claude 可以在工具调用之间进行思考，使其在智能体工作流中特别有效。

如何使用自适应思考

在 API 请求中将 thinking.type 设置为 "adaptive"：

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 16000,
    "thinking": {
        "type": "adaptive"
    },
    "messages": [
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
}'

自适应思考与 effort 参数

您可以将自适应思考与 effort 参数结合使用，以引导 Claude 进行多少思考。effort 级别作为 Claude 思考分配的软性指导：

Effort 级别	思考行为
`max`	Claude 始终思考，对思考深度没有限制。仅限 Opus 4.6——在其他模型上使用 `max` 的请求将返回错误。
`high`（默认）	Claude 始终思考。对复杂任务提供深度推理。
`medium`	Claude 使用适度思考。对于非常简单的查询可能会跳过思考。
`low`	Claude 最小化思考。对于速度最重要的简单任务跳过思考。

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive"
    },
    output_config={
        "effort": "medium"
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)

print(response.content[0].text)

自适应思考的流式传输

自适应思考与流式传输无缝配合。思考块通过 thinking_delta 事件进行流式传输，与手动思考模式相同：

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "What is the greatest common divisor of 1071 and 462?"}],
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            print(f"\nStarting {event.content_block.type} block...")
        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

自适应思考 vs 手动思考 vs 禁用思考

模式	配置	可用性	何时使用
自适应	`thinking: {type: "adaptive"}`	Opus 4.6	Claude 决定何时思考以及思考多少。使用 `effort` 进行引导。
手动	`thinking: {type: "enabled", budget_tokens: N}`	所有模型。在 Opus 4.6 上已弃用——请改用自适应模式。	当您需要精确控制思考 token 开销时。
禁用	省略 `thinking` 参数	所有模型	当您不需要扩展思考并希望获得最低延迟时。

自适应思考目前在 Opus 4.6 上可用。较旧的模型仅支持 type: "enabled" 配合 budget_tokens。在 Opus 4.6 上，type: "enabled" 配合 budget_tokens 仍然被接受但已弃用——我们建议改用自适应思考配合 effort 参数。

重要注意事项

验证变更

使用自适应思考时，之前的助手轮次不需要以思考块开头。这比手动模式更灵活，手动模式中 API 强制要求启用思考的轮次以思考块开头。

提示缓存

使用 adaptive 思考的连续请求会保留提示缓存断点。但是，在 adaptive 和 enabled/disabled 思考模式之间切换会破坏消息的缓存断点。无论模式如何变化，系统提示和工具定义都会保持缓存。

调整思考行为

自适应思考的触发行为是可通过提示引导的。如果 Claude 思考的频率比您期望的更多或更少，您可以在系统提示中添加指导：

Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

引导 Claude 减少思考频率可能会降低受益于推理的任务的质量。在将基于提示的调整部署到生产环境之前，请衡量对您特定工作负载的影响。考虑先使用较低的 effort 级别进行测试。

成本控制

使用 max_tokens 作为总输出（思考 + 响应文本）的硬性限制。effort 参数提供关于 Claude 分配多少思考的额外软性指导。两者结合使用，可以有效控制成本。

在 high 和 max effort 级别下，Claude 可能会进行更广泛的思考，更有可能耗尽 max_tokens 预算。如果您在响应中观察到 stop_reason: "max_tokens"，请考虑增加 max_tokens 以给模型更多空间，或降低 effort 级别。

处理思考块

以下概念适用于所有支持扩展思考的模型，无论您使用自适应模式还是手动模式。

思考摘要

With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

Here are some important considerations for summarized thinking:

You're charged for the full thinking tokens generated by the original request, not the summary tokens.
The billed output token count will not match the count of tokens you see in the response.
The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

Claude Sonnet 3.7 continues to return full thinking output.

In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

思考加密

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.

If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

Here are some important considerations on thinking encryption:

When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
signature values are significantly longer in Claude 4 models than in previous models.
The signature field is an opaque field and should not be interpreted or parsed - it exists solely for verification purposes.
signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

思考编辑

Occasionally Claude's internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted_thinking block. redacted_thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.

When building customer-facing applications that use extended thinking:

Be aware that redacted thinking blocks contain encrypted content that isn't human-readable
Consider providing a simple explanation like: "Some of Claude's internal reasoning has been automatically encrypted for safety reasons. This doesn't affect the quality of responses."
If showing thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks
Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted
Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI

Here's an example showing both normal and redacted thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.

If you need to test redacted thinking handling in your application, you can use this special test string as your prompt: ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

When passing thinking and redacted_thinking blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn. This is critical for maintaining the model's reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the Preserving thinking blocks section.

定价

For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

The thinking process incurs charges for:

Tokens used during thinking (output tokens)
Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
Standard text output tokens

When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

When using summarized thinking:

Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
Output tokens (billed): The original thinking tokens that Claude generated internally
Output tokens (visible): The summarized thinking tokens you see in the response
No charge: Tokens used to generate the summary

The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

其他主题

扩展思考页面通过特定模式的代码示例更详细地涵盖了以下主题：

工具使用与思考：自适应思考适用相同的规则——在工具调用之间保留思考块，并注意思考激活时 tool_choice 的限制。
提示缓存：使用自适应思考时，使用相同思考模式的连续请求会保留缓存断点。在 adaptive 和 enabled/disabled 模式之间切换会破坏消息的缓存断点（系统提示和工具定义保持缓存）。
上下文窗口：思考 token 如何与 max_tokens 和上下文窗口限制交互。

后续步骤

扩展思考

了解更多关于扩展思考的信息，包括手动模式、工具使用和提示缓存。

Effort 参数

使用 effort 参数控制 Claude 响应的彻底程度。

Was this page helpful?

能力

自适应思考

让 Claude 通过自适应思考模式动态决定何时思考以及思考多少。

支持的模型

自适应思考在以下模型上受支持：

Claude Opus 4.6 (claude-opus-4-6)

thinking.type: "enabled" 和 budget_tokens 在 Opus 4.6 上已弃用，将在未来的模型版本中移除。请改用 thinking.type: "adaptive" 配合 effort 参数。

较旧的模型（Sonnet 4.5、Opus 4.5 等）不支持自适应思考，需要使用 thinking.type: "enabled" 配合 budget_tokens。

自适应思考的工作原理

自适应思考还会自动启用交错思考。这意味着 Claude 可以在工具调用之间进行思考，使其在智能体工作流中特别有效。

如何使用自适应思考

在 API 请求中将 thinking.type 设置为 "adaptive"：

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 16000,
    "thinking": {
        "type": "adaptive"
    },
    "messages": [
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
}'

自适应思考与 effort 参数

您可以将自适应思考与 effort 参数结合使用，以引导 Claude 进行多少思考。effort 级别作为 Claude 思考分配的软性指导：

Effort 级别	思考行为
`max`	Claude 始终思考，对思考深度没有限制。仅限 Opus 4.6——在其他模型上使用 `max` 的请求将返回错误。
`high`（默认）	Claude 始终思考。对复杂任务提供深度推理。
`medium`	Claude 使用适度思考。对于非常简单的查询可能会跳过思考。
`low`	Claude 最小化思考。对于速度最重要的简单任务跳过思考。

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive"
    },
    output_config={
        "effort": "medium"
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)

print(response.content[0].text)

自适应思考的流式传输

自适应思考与流式传输无缝配合。思考块通过 thinking_delta 事件进行流式传输，与手动思考模式相同：

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "What is the greatest common divisor of 1071 and 462?"}],
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            print(f"\nStarting {event.content_block.type} block...")
        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

自适应思考 vs 手动思考 vs 禁用思考

模式	配置	可用性	何时使用
自适应	`thinking: {type: "adaptive"}`	Opus 4.6	Claude 决定何时思考以及思考多少。使用 `effort` 进行引导。
手动	`thinking: {type: "enabled", budget_tokens: N}`	所有模型。在 Opus 4.6 上已弃用——请改用自适应模式。	当您需要精确控制思考 token 开销时。
禁用	省略 `thinking` 参数	所有模型	当您不需要扩展思考并希望获得最低延迟时。

重要注意事项

验证变更

使用自适应思考时，之前的助手轮次不需要以思考块开头。这比手动模式更灵活，手动模式中 API 强制要求启用思考的轮次以思考块开头。

提示缓存

调整思考行为

自适应思考的触发行为是可通过提示引导的。如果 Claude 思考的频率比您期望的更多或更少，您可以在系统提示中添加指导：

Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

成本控制

处理思考块

以下概念适用于所有支持扩展思考的模型，无论您使用自适应模式还是手动模式。

思考摘要

Here are some important considerations for summarized thinking:

You're charged for the full thinking tokens generated by the original request, not the summary tokens.
The billed output token count will not match the count of tokens you see in the response.
The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

Claude Sonnet 3.7 continues to return full thinking output.

In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

思考加密

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

Here are some important considerations on thinking encryption:

When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
signature values are significantly longer in Claude 4 models than in previous models.
The signature field is an opaque field and should not be interpreted or parsed - it exists solely for verification purposes.
signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

思考编辑

When building customer-facing applications that use extended thinking:

Be aware that redacted thinking blocks contain encrypted content that isn't human-readable
Consider providing a simple explanation like: "Some of Claude's internal reasoning has been automatically encrypted for safety reasons. This doesn't affect the quality of responses."
If showing thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks
Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted
Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI

Here's an example showing both normal and redacted thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.

定价

For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

The thinking process incurs charges for:

Tokens used during thinking (output tokens)
Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
Standard text output tokens

When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

When using summarized thinking:

Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
Output tokens (billed): The original thinking tokens that Claude generated internally
Output tokens (visible): The summarized thinking tokens you see in the response
No charge: Tokens used to generate the summary

The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

其他主题

扩展思考页面通过特定模式的代码示例更详细地涵盖了以下主题：

工具使用与思考：自适应思考适用相同的规则——在工具调用之间保留思考块，并注意思考激活时 tool_choice 的限制。
提示缓存：使用自适应思考时，使用相同思考模式的连续请求会保留缓存断点。在 adaptive 和 enabled/disabled 模式之间切换会破坏消息的缓存断点（系统提示和工具定义保持缓存）。
上下文窗口：思考 token 如何与 max_tokens 和上下文窗口限制交互。

后续步骤

扩展思考

了解更多关于扩展思考的信息，包括手动模式、工具使用和提示缓存。

Effort 参数

使用 effort 参数控制 Claude 响应的彻底程度。

Was this page helpful?