Adaptives Denken

Adaptives Denken ist die empfohlene Methode, um erweitertes Denken mit Claude Opus 4.6 zu nutzen. Anstatt manuell ein Denk-Token-Budget festzulegen, ermöglicht adaptives Denken Claude, dynamisch zu entscheiden, wann und wie viel basierend auf der Komplexität jeder Anfrage gedacht werden soll.

Adaptives Denken führt zuverlässig zu besserer Leistung als erweitertes Denken mit einem festen budget_tokens, und wir empfehlen, zu adaptivem Denken zu wechseln, um die intelligentesten Antworten von Opus 4.6 zu erhalten. Es ist kein Beta-Header erforderlich.

Unterstützte Modelle

Adaptives Denken wird auf den folgenden Modellen unterstützt:

Claude Opus 4.6 (claude-opus-4-6)

thinking.type: "enabled" und budget_tokens sind veraltet auf Opus 4.6 und werden in einer zukünftigen Modellversion entfernt. Verwenden Sie thinking.type: "adaptive" mit dem Effort-Parameter stattdessen.

Ältere Modelle (Sonnet 4.5, Opus 4.5, usw.) unterstützen adaptives Denken nicht und erfordern thinking.type: "enabled" mit budget_tokens.

Wie adaptives Denken funktioniert

Im adaptiven Modus ist Denken optional für das Modell. Claude bewertet die Komplexität jeder Anfrage und entscheidet, ob und wie viel gedacht werden soll. Auf der Standard-Effort-Stufe (high) wird Claude fast immer denken. Bei niedrigeren Effort-Stufen kann Claude das Denken bei einfacheren Problemen überspringen.

Adaptives Denken aktiviert auch automatisch verschachteltes Denken. Dies bedeutet, dass Claude zwischen Werkzeugaufrufen denken kann, was es besonders effektiv für agentenbasierte Workflows macht.

Wie man adaptives Denken verwendet

Setzen Sie thinking.type auf "adaptive" in Ihrer API-Anfrage:

Adaptives Denken mit dem Effort-Parameter

Sie können adaptives Denken mit dem Effort-Parameter kombinieren, um zu steuern, wie viel Claude denkt. Die Effort-Stufe dient als sanfte Anleitung für Claudes Denk-Zuteilung:

Effort-Stufe	Denk-Verhalten
`max`	Claude denkt immer ohne Einschränkungen auf die Denktiefe. Nur Opus 4.6 — Anfragen mit `max` auf anderen Modellen geben einen Fehler zurück.
`high` (Standard)	Claude denkt immer. Bietet tiefes Denken bei komplexen Aufgaben.
`medium`	Claude nutzt moderates Denken. Kann das Denken bei sehr einfachen Anfragen überspringen.
`low`	Claude minimiert das Denken. Überspringt das Denken bei einfachen Aufgaben, bei denen Geschwindigkeit am wichtigsten ist.

Streaming mit adaptivem Denken

Adaptives Denken funktioniert nahtlos mit Streaming. Denk-Blöcke werden über thinking_delta-Ereignisse gestreamt, genau wie im manuellen Denkmodus:

Adaptives vs. manuelles vs. deaktiviertes Denken

Modus	Konfiguration	Verfügbarkeit	Wann zu verwenden
Adaptiv	`thinking: {type: "adaptive"}`	Opus 4.6	Claude entscheidet, wann und wie viel gedacht werden soll. Verwenden Sie `effort` zur Anleitung.
Manuell	`thinking: {type: "enabled", budget_tokens: N}`	Alle Modelle. Veraltet auf Opus 4.6 — verwenden Sie stattdessen adaptiven Modus.	Wenn Sie präzise Kontrolle über die Denk-Token-Ausgaben benötigen.
Deaktiviert	Lassen Sie den `thinking`-Parameter weg	Alle Modelle	Wenn Sie kein erweitertes Denken benötigen und die niedrigste Latenz wünschen.

Adaptives Denken ist derzeit auf Opus 4.6 verfügbar. Ältere Modelle unterstützen nur type: "enabled" mit budget_tokens. Auf Opus 4.6 wird type: "enabled" mit budget_tokens noch akzeptiert, ist aber veraltet — wir empfehlen, adaptives Denken mit dem Effort-Parameter zu verwenden.

Wichtige Überlegungen

Validierungsänderungen

Bei Verwendung von adaptivem Denken müssen vorherige Assistent-Turns nicht mit Denk-Blöcken beginnen. Dies ist flexibler als der manuelle Modus, bei dem die API erzwingt, dass Turns mit aktiviertem Denken mit einem Denk-Block beginnen.

Prompt-Caching

Aufeinanderfolgende Anfragen mit adaptive-Denken bewahren Prompt-Cache-Breakpoints. Allerdings bricht das Wechseln zwischen adaptive und enabled/disabled-Denkmodi Cache-Breakpoints für Nachrichten. Systemaufforderungen und Werkzeugdefinitionen bleiben unabhängig von Modusänderungen zwischengespeichert.

Denk-Verhalten abstimmen

Das Auslöseverhalten des adaptiven Denkens ist aufforderbar. Wenn Claude öfter oder seltener denkt, als Sie möchten, können Sie Ihrer Systemaufforderung Anleitung hinzufügen:

Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

Das Lenken von Claude zu weniger häufigem Denken kann die Qualität bei Aufgaben verringern, die von Denken profitieren. Messen Sie die Auswirkungen auf Ihre spezifischen Workloads, bevor Sie promptbasierte Abstimmung in der Produktion bereitstellen. Erwägen Sie, zuerst mit niedrigeren Effort-Stufen zu testen.

Kostenkontrolle

Verwenden Sie max_tokens als harte Grenze für die Gesamtausgabe (Denken + Antworttext). Der effort-Parameter bietet zusätzliche sanfte Anleitung, wie viel Denken Claude zuteilt. Zusammen geben diese Ihnen effektive Kontrolle über die Kosten.

Bei high- und max-Effort-Stufen kann Claude umfangreicher denken und ist eher geneigt, das max_tokens-Budget auszuschöpfen. Wenn Sie stop_reason: "max_tokens" in Antworten beobachten, erwägen Sie, max_tokens zu erhöhen, um dem Modell mehr Platz zu geben, oder senken Sie die Effort-Stufe.

Arbeiten mit Denk-Blöcken

Die folgenden Konzepte gelten für alle Modelle, die erweitertes Denken unterstützen, unabhängig davon, ob Sie adaptiven oder manuellen Modus verwenden.

Zusammengefasstes Denken

With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

Here are some important considerations for summarized thinking:

You're charged for the full thinking tokens generated by the original request, not the summary tokens.
The billed output token count will not match the count of tokens you see in the response.
The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

Claude Sonnet 3.7 continues to return full thinking output.

In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

Denk-Verschlüsselung

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.

If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

Here are some important considerations on thinking encryption:

When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
signature values are significantly longer in Claude 4 models than in previous models.
The signature field is an opaque field and should not be interpreted or parsed - it exists solely for verification purposes.
signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

Denk-Redaktion

Occasionally Claude's internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted_thinking block. redacted_thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.

When building customer-facing applications that use extended thinking:

Be aware that redacted thinking blocks contain encrypted content that isn't human-readable
Consider providing a simple explanation like: "Some of Claude's internal reasoning has been automatically encrypted for safety reasons. This doesn't affect the quality of responses."
If showing thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks
Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted
Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI

Here's an example showing both normal and redacted thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.

If you need to test redacted thinking handling in your application, you can use this special test string as your prompt: ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

When passing thinking and redacted_thinking blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn. This is critical for maintaining the model's reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the Preserving thinking blocks section.

Preisgestaltung

For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

The thinking process incurs charges for:

Tokens used during thinking (output tokens)
Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
Standard text output tokens

When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

When using summarized thinking:

Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
Output tokens (billed): The original thinking tokens that Claude generated internally
Output tokens (visible): The summarized thinking tokens you see in the response
No charge: Tokens used to generate the summary

The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

Zusätzliche Themen

Die Seite zum erweiterten Denken behandelt mehrere Themen ausführlicher mit modellspezifischen Code-Beispielen:

Werkzeugnutzung mit Denken: Die gleichen Regeln gelten für adaptives Denken — bewahren Sie Denk-Blöcke zwischen Werkzeugaufrufen und beachten Sie tool_choice-Einschränkungen, wenn Denken aktiv ist.
Prompt-Caching: Bei adaptivem Denken bewahren aufeinanderfolgende Anfragen mit dem gleichen Denkmodus Cache-Breakpoints. Das Wechseln zwischen adaptive und enabled/disabled-Modi bricht Cache-Breakpoints für Nachrichten (Systemaufforderungen und Werkzeugdefinitionen bleiben zwischengespeichert).
Kontextfenster: Wie Denk-Tokens mit max_tokens und Kontextfenstergrenzen interagieren.

Nächste Schritte

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 16000,
    "thinking": {
        "type": "adaptive"
    },
    "messages": [
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
}'

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive"
    },
    output_config={
        "effort": "medium"
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)

print(response.content[0].text)

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "What is the greatest common divisor of 1071 and 462?"}],
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            print(f"\nStarting {event.content_block.type} block...")
        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)