Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K
    First steps
    Intro to ClaudeQuickstart
    Models & pricing
    Models overviewChoosing a modelWhat's new in Claude 4.6Migration guideModel deprecationsPricing
    Build with Claude
    Features overviewUsing the Messages APIHandling stop reasonsPrompting best practices
    Model capabilities
    Extended thinkingAdaptive thinkingEffortFast mode (beta: research preview)Structured outputsCitationsStreaming MessagesBatch processingPDF supportSearch resultsMultilingual supportEmbeddingsVision
    Tools
    OverviewHow to implement tool useWeb search toolWeb fetch toolCode execution toolMemory toolBash toolComputer use toolText editor tool
    Tool infrastructure
    Tool searchProgrammatic tool callingFine-grained tool streaming
    Context management
    Context windowsCompactionContext editingPrompt cachingToken counting
    Files & assets
    Files API
    Agent Skills
    OverviewQuickstartBest practicesSkills for enterpriseUsing Skills with the API
    Agent SDK
    OverviewQuickstartHow the agent loop works
    MCP in the API
    MCP connectorRemote MCP servers
    Claude on 3rd-party platforms
    Amazon BedrockMicrosoft FoundryVertex AI
    Prompt engineering
    OverviewConsole prompting tools
    Test & evaluate
    Define success and build evaluationsUsing the Evaluation ToolReducing latency
    Strengthen guardrails
    Reduce hallucinationsIncrease output consistencyMitigate jailbreaksStreaming refusalsReduce prompt leak
    Administration and monitoring
    Admin API overviewData residencyWorkspacesUsage and Cost APIClaude Code Analytics APIZero Data Retention
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    Model capabilities

    Adaptive thinking

    Let Claude dynamically determine when and how much to use extended thinking with adaptive thinking mode.

    Adaptive thinking is the recommended way to use extended thinking with Claude Opus 4.6 and Sonnet 4.6. Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically determine when and how much to use extended thinking based on the complexity of each request.

    Adaptive thinking can drive better performance than extended thinking with a fixed budget_tokens for many workloads, especially bimodal tasks and long-horizon agentic workflows. No beta header is required.

    For workloads where predictable latency and token usage matter, or where you need precise control over thinking costs, extended thinking with budget_tokens continues to be fully supported. Consider testing both modes on your specific workloads to determine which works best.

    Supported models

    Adaptive thinking is supported on the following models:

    • Claude Opus 4.6 (claude-opus-4-6)
    • Claude Sonnet 4.6 (claude-sonnet-4-6)

    thinking.type: "enabled" and budget_tokens are deprecated on Opus 4.6 and Sonnet 4.6 and will be removed in a future model release. Use thinking.type: "adaptive" with the effort parameter instead. If you are already using extended thinking with budget_tokens, it continues to work and no immediate changes are required.

    Older models (Sonnet 4.5, Opus 4.5, etc.) do not support adaptive thinking and require thinking.type: "enabled" with budget_tokens.

    How adaptive thinking works

    In adaptive mode, thinking is optional for the model. Claude evaluates the complexity of each request and determines whether and how much to use extended thinking. At the default effort level (high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems.

    Adaptive thinking also automatically enables interleaved thinking. This means Claude can think between tool calls, making it especially effective for agentic workflows.

    How to use adaptive thinking

    Set thinking.type to "adaptive" in your API request:

    curl https://api.anthropic.com/v1/messages \
         --header "x-api-key: $ANTHROPIC_API_KEY" \
         --header "anthropic-version: 2023-06-01" \
         --header "content-type: application/json" \
         --data \
    '{
        "model": "claude-opus-4-6",
        "max_tokens": 16000,
        "thinking": {
            "type": "adaptive"
        },
        "messages": [
            {
                "role": "user",
                "content": "Explain why the sum of two even numbers is always even."
            }
        ]
    }'

    Adaptive thinking with the effort parameter

    You can combine adaptive thinking with the effort parameter to guide how much thinking Claude does. The effort level acts as soft guidance for Claude's thinking allocation:

    Effort levelThinking behavior
    maxClaude always thinks with no constraints on thinking depth. Opus 4.6 only. Requests using max on other models return an error.
    high (default)Claude always thinks. Provides deep reasoning on complex tasks.
    mediumClaude uses moderate thinking. May skip thinking for very simple queries.
    lowClaude minimizes thinking. Skips thinking for simple tasks where speed matters most.
    curl https://api.anthropic.com/v1/messages \
         --header "x-api-key: $ANTHROPIC_API_KEY" \
         --header "anthropic-version: 2023-06-01" \
         --header "content-type: application/json" \
         --data \
    '{
        "model": "claude-opus-4-6",
        "max_tokens": 16000,
        "thinking": {
            "type": "adaptive"
        },
        "output_config": {
            "effort": "medium"
        },
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ]
    }'

    Streaming with adaptive thinking

    Adaptive thinking works seamlessly with streaming. Thinking blocks are streamed via thinking_delta events just like manual thinking mode:

    client = anthropic.Anthropic()
    
    with client.messages.stream(
        model="claude-opus-4-6",
        max_tokens=16000,
        thinking={"type": "adaptive"},
        messages=[
            {
                "role": "user",
                "content": "What is the greatest common divisor of 1071 and 462?",
            }
        ],
    ) as stream:
        for event in stream:
            if event.type == "content_block_start":
                print(f"\nStarting {event.content_block.type} block...")
            elif event.type == "content_block_delta":
                if event.delta.type == "thinking_delta":
                    print(event.delta.thinking, end="", flush=True)
                elif event.delta.type == "text_delta":
                    print(event.delta.text, end="", flush=True)

    Adaptive vs manual vs disabled thinking

    ModeConfigAvailabilityWhen to use
    Adaptivethinking: {type: "adaptive"}Opus 4.6, Sonnet 4.6Claude determines when and how much to use extended thinking. Use effort to guide.
    Manualthinking: {type: "enabled", budget_tokens: N}All models. Deprecated on Opus 4.6 and Sonnet 4.6 (consider adaptive mode instead).When you need precise control over thinking token spend.
    DisabledOmit thinking parameter or pass {type: "disabled"}All modelsWhen you don't need extended thinking and want the lowest latency.

    Adaptive thinking is available on Opus 4.6 and Sonnet 4.6. Older models only support type: "enabled" with budget_tokens. On both Opus 4.6 and Sonnet 4.6, type: "enabled" with budget_tokens is still accepted but deprecated.

    Interleaved thinking availability by mode:

    • Adaptive mode: Interleaved thinking is automatically enabled on both Opus 4.6 and Sonnet 4.6.
    • Manual mode on Sonnet 4.6: Interleaved thinking is supported via the interleaved-thinking-2025-05-14 beta header.
    • Manual mode on Opus 4.6: Interleaved thinking is not available. If your agentic workflow requires thinking between tool calls on Opus 4.6, use adaptive mode.

    Important considerations

    Validation changes

    When using adaptive thinking, previous assistant turns don't need to start with thinking blocks. This is more flexible than manual mode, where the API enforces that thinking-enabled turns begin with a thinking block.

    Prompt caching

    Consecutive requests using adaptive thinking preserve prompt cache breakpoints. However, switching between adaptive and enabled/disabled thinking modes breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.

    Tuning thinking behavior

    Adaptive thinking's triggering behavior is promptable. If Claude is thinking more or less often than you'd like, you can add guidance to your system prompt:

    Extended thinking adds latency and should only be used when it
    will meaningfully improve answer quality — typically for problems
    that require multi-step reasoning. When in doubt, respond directly.

    Steering Claude to think less often may reduce quality on tasks that benefit from reasoning. Measure the impact on your specific workloads before deploying prompt-based tuning to production. Consider testing with lower effort levels first.

    Cost control

    Use max_tokens as a hard limit on total output (thinking + response text). The effort parameter provides additional soft guidance on how much thinking Claude allocates. Together, these give you effective control over cost.

    At high and max effort levels, Claude may think more extensively and can be more likely to exhaust the max_tokens budget. If you observe stop_reason: "max_tokens" in responses, consider increasing max_tokens to give the model more room, or lowering the effort level.

    Working with thinking blocks

    The following concepts apply to all models that support extended thinking, regardless of whether you use adaptive or manual mode.

    Summarized thinking

    With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse. This is the default behavior when the display field on the thinking configuration is unset or set to "summarized".

    Here are some important considerations for summarized thinking:

    • You're charged for the full thinking tokens generated by the original request, not the summary tokens.
    • The billed output token count will not match the count of tokens you see in the response.
    • The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
    • As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
    • Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
    • Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

    Claude Sonnet 3.7 continues to return full thinking output.

    In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

    Controlling thinking display

    The display field on the thinking configuration controls how thinking content is returned in API responses. It accepts two values:

    • "summarized" (default): Thinking blocks contain summarized thinking text. See Summarized thinking for details.
    • "omitted": Thinking blocks are returned with an empty thinking field. The signature field still carries the encrypted full thinking for multi-turn continuity (see Thinking encryption).

    Setting display: "omitted" is useful when your application doesn't surface thinking content to users. The primary benefit is faster time-to-first-text-token when streaming: The server skips streaming thinking tokens entirely and delivers only the signature, so the final text response begins streaming sooner.

    No SDK currently includes display in its type definitions. The Python SDK forwards unrecognized dict keys to the API at runtime; passing display in the thinking dict works transparently. The TypeScript SDK requires a type assertion. The C#, Go, Java, PHP, and Ruby SDKs require a direct HTTP request until native support lands.

    Here are some important considerations for omitted thinking:

    • You're still charged for the full thinking tokens. Omitting reduces latency, not cost.
    • If you pass thinking blocks back in multi-turn conversations, pass them unchanged. The server decrypts the signature to reconstruct the original thinking for prompt construction (see Preserving thinking blocks). Any text you place in the thinking field of a round-tripped omitted block is ignored.
    • display is invalid with thinking.type: "disabled" (there is nothing to display).
    • When using thinking.type: "adaptive" and the model skips thinking for a simple request, no thinking block is produced regardless of display.

    The signature field is identical whether display is "summarized" or "omitted". Switching display values between turns in a conversation is supported.

    For code examples and streaming behavior with display: "omitted", see Controlling thinking display on the extended thinking page. The examples there use type: "enabled"; with adaptive thinking, use:

    thinking = {"type": "adaptive", "display": "omitted"}

    Thinking encryption

    Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

    It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.

    If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

    Here are some important considerations on thinking encryption:

    • When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
    • signature values are significantly longer in Claude 4 models than in previous models.
    • The signature field is an opaque field and should not be interpreted or parsed.
    • signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

    Pricing

    For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

    The thinking process incurs charges for:

    • Tokens used during thinking (output tokens)
    • Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
    • Standard text output tokens

    When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

    When using summarized thinking:

    • Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
    • Output tokens (billed): The original thinking tokens that Claude generated internally
    • Output tokens (visible): The summarized thinking tokens you see in the response
    • No charge: Tokens used to generate the summary

    When using display: "omitted":

    • Input tokens: Tokens in your original request (same as summarized)
    • Output tokens (billed): The original thinking tokens that Claude generated internally (same as summarized)
    • Output tokens (visible): Zero thinking tokens (the thinking field is empty)

    The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the thinking content visible in the response.

    Additional topics

    The extended thinking page covers several topics in more detail with mode-specific code examples:

    • Tool use with thinking: The same rules apply for adaptive thinking: preserve thinking blocks between tool calls and be aware of tool_choice limitations when thinking is active.
    • Prompt caching: With adaptive thinking, consecutive requests using the same thinking mode preserve cache breakpoints. Switching between adaptive and enabled/disabled modes breaks cache breakpoints for messages (system prompts and tool definitions remain cached).
    • Context windows: How thinking tokens interact with max_tokens and context window limits.

    Next steps

    Extended thinking

    Learn more about extended thinking, including manual mode, tool use, and prompt caching.

    Effort parameter

    Control how thoroughly Claude responds with the effort parameter.

    Was this page helpful?

    • Supported models
    • How adaptive thinking works
    • How to use adaptive thinking
    • Adaptive thinking with the effort parameter
    • Streaming with adaptive thinking
    • Adaptive vs manual vs disabled thinking
    • Important considerations
    • Validation changes
    • Prompt caching
    • Tuning thinking behavior
    • Cost control
    • Working with thinking blocks
    • Summarized thinking
    • Controlling thinking display
    • Thinking encryption
    • Pricing
    • Additional topics
    • Next steps