Was this page helpful?
This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6, and is the default mode on Claude Mythos Preview (where it auto-applies whenever thinking is unset). Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically determine when and how much to use extended thinking based on the complexity of each request. On Claude Opus 4.7, adaptive thinking is the only supported thinking mode; manual thinking: {type: "enabled", budget_tokens: N} is no longer accepted.
Adaptive thinking can drive better performance than extended thinking with a fixed budget_tokens for many workloads, especially bimodal tasks and long-horizon agentic workflows. No beta header is required.
If your workload requires predictable latency or precise control over thinking costs, extended thinking with budget_tokens is still functional on Claude Opus 4.6 and Claude Sonnet 4.6 but is deprecated and no longer recommended. See the warning below.
Adaptive thinking is supported on the following models:
claude-mythos-preview), adaptive thinking is the default; thinking: {type: "disabled"} is not supportedclaude-opus-4-7), adaptive thinking is the only supported thinking mode. Thinking is off unless you explicitly set thinking: {type: "adaptive"} in your request; manual thinking: {type: "enabled"} is rejected with a 400 error.claude-opus-4-6)claude-sonnet-4-6)thinking.type: "enabled" and budget_tokens are deprecated on Opus 4.6 and Sonnet 4.6 and will be removed in a future model release. Use thinking.type: "adaptive" with the effort parameter instead. Existing budget_tokens configurations are still functional but no longer recommended; plan to migrate.
Older models (Sonnet 4.5, Opus 4.5, etc.) do not support adaptive thinking and require thinking.type: "enabled" with budget_tokens.
In adaptive mode, thinking is optional for the model. Claude evaluates the complexity of each request and determines whether and how much to use extended thinking. At the default effort level (high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems.
Adaptive thinking also automatically enables interleaved thinking. This means Claude can think between tool calls, making it especially effective for agentic workflows.
Set thinking.type to "adaptive" in your API request:
You can combine adaptive thinking with the effort parameter to guide how much thinking Claude does. The effort level acts as soft guidance for Claude's thinking allocation:
| Effort level | Thinking behavior |
|---|---|
max | Claude always thinks with no constraints on thinking depth. Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. |
xhigh | Claude always thinks deeply with extended exploration. Available on Claude Opus 4.7. |
high (default) | Claude always thinks. Provides deep reasoning on complex tasks. |
medium | Claude uses moderate thinking. May skip thinking for very simple queries. |
low | Claude minimizes thinking. Skips thinking for simple tasks where speed matters most. |
Adaptive thinking works seamlessly with streaming. Thinking blocks are streamed via thinking_delta events just like manual thinking mode:
| Mode | Config | Availability | When to use |
|---|---|---|---|
| Adaptive | thinking: {type: "adaptive"} | Claude Mythos Preview (default), Opus 4.7 (only mode), Opus 4.6, Sonnet 4.6 | Claude determines when and how much to use extended thinking. Use effort to guide. |
| Manual | thinking: {type: "enabled", budget_tokens: N} | All models except Claude Opus 4.7 (rejected). Deprecated on Opus 4.6 and Sonnet 4.6 (consider adaptive mode instead). | When you need precise control over thinking token spend. |
| Disabled | Omit thinking parameter or pass {type: "disabled"} | All models except Claude Mythos Preview | When you don't need extended thinking and want the lowest latency. |
Adaptive thinking is available on Claude Mythos Preview, Claude Opus 4.7, Opus 4.6, and Sonnet 4.6. On Mythos Preview, adaptive thinking is the default and applies automatically whenever thinking is unset. On Claude Opus 4.7, adaptive thinking is the only supported mode and type: "enabled" with budget_tokens is rejected. Older models only support type: "enabled" with budget_tokens. On Opus 4.6 and Sonnet 4.6, type: "enabled" with budget_tokens is still functional but deprecated.
Interleaved thinking availability by mode:
interleaved-thinking-2025-05-14 beta header.When using adaptive thinking, previous assistant turns don't need to start with thinking blocks. This is more flexible than manual mode, where the API enforces that thinking-enabled turns begin with a thinking block.
Consecutive requests using adaptive thinking preserve prompt cache breakpoints. However, switching between adaptive and enabled/disabled thinking modes breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.
Adaptive thinking's triggering behavior is promptable. If Claude is thinking more or less often than you'd like, you can add guidance to your system prompt:
Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.Steering Claude to think less often may reduce quality on tasks that benefit from reasoning. Measure the impact on your specific workloads before deploying prompt-based tuning to production. Consider testing with lower effort levels first.
Use max_tokens as a hard limit on total output (thinking + response text). The effort parameter provides additional soft guidance on how much thinking Claude allocates. Together, these give you effective control over cost.
At high and max effort levels, Claude may think more extensively and can be more likely to exhaust the max_tokens budget. If you observe stop_reason: "max_tokens" in responses, consider increasing max_tokens to give the model more room, or lowering the effort level.
The following concepts apply to all models that support extended thinking, regardless of whether you use adaptive or manual mode.
With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse. This is the default behavior on Claude 4 models when the display field on the thinking configuration is unset or set to "summarized". On Claude Opus 4.7 and Claude Mythos Preview, display defaults to "omitted" instead, so you must set display: "summarized" explicitly to receive summarized thinking.
Here are some important considerations for summarized thinking:
Claude Sonnet 3.7 continues to return full thinking output.
In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.
The display field on the thinking configuration controls how thinking content is returned in API responses. It accepts two values:
"summarized": Thinking blocks contain summarized thinking text. See Summarized thinking for details. This is the default on Claude Opus 4.6, Claude Sonnet 4.6, and earlier Claude 4 models."omitted": Thinking blocks are returned with an empty thinking field. The signature field still carries the encrypted full thinking for multi-turn continuity (see Thinking encryption). This is the default on Claude Opus 4.7 and Claude Mythos Preview.Setting display: "omitted" is useful when your application doesn't surface thinking content to users. The primary benefit is faster time-to-first-text-token when streaming: The server skips streaming thinking tokens entirely and delivers only the signature, so the final text response begins streaming sooner.
Here are some important considerations for omitted thinking:
signature to reconstruct the original thinking for prompt construction (see Preserving thinking blocks). Any text you place in the thinking field of a round-tripped omitted block is ignored.display is invalid with thinking.type: "disabled" (there is nothing to display).thinking.type: "adaptive" and the model skips thinking for a simple request, no thinking block is produced regardless of display.The signature field is identical whether display is "summarized" or "omitted". Switching display values between turns in a conversation is supported.
On Claude Opus 4.7, thinking.display defaults to "omitted". Thinking blocks still appear in the response stream, but their thinking field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was "summarized". To restore summarized thinking text on Claude Opus 4.7, set thinking.display to "summarized" explicitly:
thinking = {
"type": "adaptive",
"display": "summarized",
}For code examples and streaming behavior with display: "omitted", see Controlling thinking display on the extended thinking page. The examples there use type: "enabled"; with adaptive thinking, use:
thinking = {"type": "adaptive", "display": "omitted"}Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.
It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns. If you pass them back, whether the API keeps or strips them depends on the model: Opus 4.5+ and Sonnet 4.6+ keep them in context by default; earlier Opus/Sonnet models and all Haiku models strip them. See context editing to configure this.
If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.
Here are some important considerations on thinking encryption:
signature_delta inside a content_block_delta event just before the content_block_stop event.signature values are significantly longer in Claude 4 models than in previous models.signature field is an opaque field and should not be interpreted or parsed.signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.
The thinking process incurs charges for:
When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.
When using summarized thinking:
When using display: "omitted":
thinking field is empty)The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the thinking content visible in the response.
The extended thinking page covers several topics in more detail with mode-specific code examples:
tool_choice limitations when thinking is active.adaptive and enabled/disabled modes breaks cache breakpoints for messages (system prompts and tool definitions remain cached).max_tokens and context window limits.client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "Explain why the sum of two even numbers is always even.",
}
],
)
for block in response.content:
if block.type == "thinking":
print(f"\nThinking: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "medium"},
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.content[0].text)client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{
"role": "user",
"content": "What is the greatest common divisor of 1071 and 462?",
}
],
) as stream:
for event in stream:
if event.type == "content_block_start":
print(f"\nStarting {event.content_block.type} block...")
elif event.type == "content_block_delta":
if event.delta.type == "thinking_delta":
print(event.delta.thinking, end="", flush=True)
elif event.delta.type == "text_delta":
print(event.delta.text, end="", flush=True)Learn more about extended thinking, including manual mode, tool use, and prompt caching.