Capacidades

Pensamiento adaptativo

Permite que Claude decida dinámicamente cuándo y cuánto pensar con el modo de pensamiento adaptativo.

El pensamiento adaptativo es la forma recomendada de usar pensamiento extendido con Claude Opus 4.6. En lugar de establecer manualmente un presupuesto de tokens de pensamiento, el pensamiento adaptativo permite que Claude decida dinámicamente cuándo y cuánto pensar según la complejidad de cada solicitud.

El pensamiento adaptativo impulsa de manera confiable un mejor rendimiento que el pensamiento extendido con un budget_tokens fijo, y recomendamos cambiar al pensamiento adaptativo para obtener las respuestas más inteligentes de Opus 4.6. No se requiere encabezado beta.

Modelos compatibles

El pensamiento adaptativo es compatible con los siguientes modelos:

Claude Opus 4.6 (claude-opus-4-6)

thinking.type: "enabled" y budget_tokens están deprecados en Opus 4.6 y se eliminarán en una versión futura del modelo. Usa thinking.type: "adaptive" con el parámetro effort en su lugar.

Los modelos más antiguos (Sonnet 4.5, Opus 4.5, etc.) no admiten pensamiento adaptativo y requieren thinking.type: "enabled" con budget_tokens.

Cómo funciona el pensamiento adaptativo

En modo adaptativo, el pensamiento es opcional para el modelo. Claude evalúa la complejidad de cada solicitud y decide si y cuánto pensar. En el nivel de esfuerzo predeterminado (high), Claude casi siempre pensará. En niveles de esfuerzo más bajos, Claude puede omitir el pensamiento para problemas más simples.

El pensamiento adaptativo también habilita automáticamente pensamiento intercalado. Esto significa que Claude puede pensar entre llamadas de herramientas, lo que lo hace especialmente efectivo para flujos de trabajo de agentes.

Cómo usar el pensamiento adaptativo

Establece thinking.type en "adaptive" en tu solicitud de API:

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 16000,
    "thinking": {
        "type": "adaptive"
    },
    "messages": [
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
}'

Pensamiento adaptativo con el parámetro effort

Puedes combinar el pensamiento adaptativo con el parámetro effort para guiar cuánto piensa Claude. El nivel de esfuerzo actúa como una guía suave para la asignación de pensamiento de Claude:

Nivel de esfuerzo	Comportamiento de pensamiento
`max`	Claude siempre piensa sin restricciones en la profundidad del pensamiento. Solo Opus 4.6 — las solicitudes que usan `max` en otros modelos devolverán un error.
`high` (predeterminado)	Claude siempre piensa. Proporciona razonamiento profundo en tareas complejas.
`medium`	Claude usa pensamiento moderado. Puede omitir el pensamiento para consultas muy simples.
`low`	Claude minimiza el pensamiento. Omite el pensamiento para tareas simples donde la velocidad es lo más importante.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive"
    },
    output_config={
        "effort": "medium"
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)

print(response.content[0].text)

Streaming con pensamiento adaptativo

El pensamiento adaptativo funciona sin problemas con streaming. Los bloques de pensamiento se transmiten a través de eventos thinking_delta al igual que en el modo de pensamiento manual:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "What is the greatest common divisor of 1071 and 462?"}],
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            print(f"\nStarting {event.content_block.type} block...")
        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

Pensamiento adaptativo vs manual vs deshabilitado

Modo	Configuración	Disponibilidad	Cuándo usar
Adaptativo	`thinking: {type: "adaptive"}`	Opus 4.6	Claude decide cuándo y cuánto pensar. Usa `effort` para guiar.
Manual	`thinking: {type: "enabled", budget_tokens: N}`	Todos los modelos. Deprecado en Opus 4.6 — usa modo adaptativo en su lugar.	Cuando necesitas control preciso sobre el gasto de tokens de pensamiento.
Deshabilitado	Omite el parámetro `thinking`	Todos los modelos	Cuando no necesitas pensamiento extendido y deseas la latencia más baja.

El pensamiento adaptativo está actualmente disponible en Opus 4.6. Los modelos más antiguos solo admiten type: "enabled" con budget_tokens. En Opus 4.6, type: "enabled" con budget_tokens aún se acepta pero está deprecado — recomendamos usar pensamiento adaptativo con el parámetro effort en su lugar.

Consideraciones importantes

Cambios de validación

Cuando se usa pensamiento adaptativo, los turnos anteriores del asistente no necesitan comenzar con bloques de pensamiento. Esto es más flexible que el modo manual, donde la API requiere que los turnos con pensamiento habilitado comiencen con un bloque de pensamiento.

Almacenamiento en caché de prompts

Las solicitudes consecutivas que usan pensamiento adaptive preservan los puntos de ruptura de caché de prompts. Sin embargo, cambiar entre modos de pensamiento adaptive y enabled/disabled rompe los puntos de ruptura de caché para mensajes. Los prompts del sistema y las definiciones de herramientas permanecen en caché independientemente de los cambios de modo.

Ajuste del comportamiento de pensamiento

El comportamiento de activación del pensamiento adaptativo es personalizable mediante prompts. Si Claude está pensando más o menos a menudo de lo que te gustaría, puedes agregar orientación a tu prompt del sistema:

Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

Dirigir a Claude a pensar menos a menudo puede reducir la calidad en tareas que se benefician del razonamiento. Mide el impacto en tus cargas de trabajo específicas antes de implementar ajustes basados en prompts en producción. Considera probar primero con niveles de esfuerzo más bajos.

Control de costos

Usa max_tokens como un límite duro en la salida total (pensamiento + texto de respuesta). El parámetro effort proporciona orientación suave adicional sobre cuánto pensamiento asigna Claude. Juntos, estos te dan control efectivo sobre el costo.

En los niveles de esfuerzo high y max, Claude puede pensar más extensamente y es más probable que agote el presupuesto de max_tokens. Si observas stop_reason: "max_tokens" en las respuestas, considera aumentar max_tokens para dar más espacio al modelo, o reduce el nivel de esfuerzo.

Trabajar con bloques de pensamiento

Los siguientes conceptos se aplican a todos los modelos que admiten pensamiento extendido, independientemente de si usas modo adaptativo o manual.

Pensamiento resumido

With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude's full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

Here are some important considerations for summarized thinking:

You're charged for the full thinking tokens generated by the original request, not the summary tokens.
The billed output token count will not match the count of tokens you see in the response.
The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

Claude Sonnet 3.7 continues to return full thinking output.

In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

Encriptación de pensamiento

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.

If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

Here are some important considerations on thinking encryption:

When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
signature values are significantly longer in Claude 4 models than in previous models.
The signature field is an opaque field and should not be interpreted or parsed - it exists solely for verification purposes.
signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

Redacción de pensamiento

Occasionally Claude's internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted_thinking block. redacted_thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.

When building customer-facing applications that use extended thinking:

Be aware that redacted thinking blocks contain encrypted content that isn't human-readable
Consider providing a simple explanation like: "Some of Claude's internal reasoning has been automatically encrypted for safety reasons. This doesn't affect the quality of responses."
If showing thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks
Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted
Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI

Here's an example showing both normal and redacted thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.

If you need to test redacted thinking handling in your application, you can use this special test string as your prompt: ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

When passing thinking and redacted_thinking blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn. This is critical for maintaining the model's reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the Preserving thinking blocks section.

Precios

For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

The thinking process incurs charges for:

Tokens used during thinking (output tokens)
Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
Standard text output tokens

When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

When using summarized thinking:

Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
Output tokens (billed): The original thinking tokens that Claude generated internally
Output tokens (visible): The summarized thinking tokens you see in the response
No charge: Tokens used to generate the summary

The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

Temas adicionales

La página de pensamiento extendido cubre varios temas con más detalle con ejemplos de código específicos del modo:

Uso de herramientas con pensamiento: Las mismas reglas se aplican para el pensamiento adaptativo — preserva bloques de pensamiento entre llamadas de herramientas y ten en cuenta las limitaciones de tool_choice cuando el pensamiento está activo.
Almacenamiento en caché de prompts: Con pensamiento adaptativo, las solicitudes consecutivas que usan el mismo modo de pensamiento preservan los puntos de ruptura de caché. Cambiar entre modos adaptive y enabled/disabled rompe los puntos de ruptura de caché para mensajes (los prompts del sistema y las definiciones de herramientas permanecen en caché).
Ventanas de contexto: Cómo los tokens de pensamiento interactúan con max_tokens y los límites de la ventana de contexto.

Próximos pasos

Pensamiento extendido

Aprende más sobre pensamiento extendido, incluyendo modo manual, uso de herramientas y almacenamiento en caché de prompts.

Parámetro effort

Controla cuán minuciosamente responde Claude con el parámetro effort.

Was this page helpful?

Capacidades

Pensamiento adaptativo

Permite que Claude decida dinámicamente cuándo y cuánto pensar con el modo de pensamiento adaptativo.

Modelos compatibles

El pensamiento adaptativo es compatible con los siguientes modelos:

Claude Opus 4.6 (claude-opus-4-6)

Los modelos más antiguos (Sonnet 4.5, Opus 4.5, etc.) no admiten pensamiento adaptativo y requieren thinking.type: "enabled" con budget_tokens.

Cómo funciona el pensamiento adaptativo

Cómo usar el pensamiento adaptativo

Establece thinking.type en "adaptive" en tu solicitud de API:

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 16000,
    "thinking": {
        "type": "adaptive"
    },
    "messages": [
        {
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }
    ]
}'

Pensamiento adaptativo con el parámetro effort

Puedes combinar el pensamiento adaptativo con el parámetro effort para guiar cuánto piensa Claude. El nivel de esfuerzo actúa como una guía suave para la asignación de pensamiento de Claude:

Nivel de esfuerzo	Comportamiento de pensamiento
`max`	Claude siempre piensa sin restricciones en la profundidad del pensamiento. Solo Opus 4.6 — las solicitudes que usan `max` en otros modelos devolverán un error.
`high` (predeterminado)	Claude siempre piensa. Proporciona razonamiento profundo en tareas complejas.
`medium`	Claude usa pensamiento moderado. Puede omitir el pensamiento para consultas muy simples.
`low`	Claude minimiza el pensamiento. Omite el pensamiento para tareas simples donde la velocidad es lo más importante.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive"
    },
    output_config={
        "effort": "medium"
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)

print(response.content[0].text)

Streaming con pensamiento adaptativo

El pensamiento adaptativo funciona sin problemas con streaming. Los bloques de pensamiento se transmiten a través de eventos thinking_delta al igual que en el modo de pensamiento manual:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "What is the greatest common divisor of 1071 and 462?"}],
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            print(f"\nStarting {event.content_block.type} block...")
        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

Pensamiento adaptativo vs manual vs deshabilitado

Modo	Configuración	Disponibilidad	Cuándo usar
Adaptativo	`thinking: {type: "adaptive"}`	Opus 4.6	Claude decide cuándo y cuánto pensar. Usa `effort` para guiar.
Manual	`thinking: {type: "enabled", budget_tokens: N}`	Todos los modelos. Deprecado en Opus 4.6 — usa modo adaptativo en su lugar.	Cuando necesitas control preciso sobre el gasto de tokens de pensamiento.
Deshabilitado	Omite el parámetro `thinking`	Todos los modelos	Cuando no necesitas pensamiento extendido y deseas la latencia más baja.

Consideraciones importantes

Cambios de validación

Almacenamiento en caché de prompts

Ajuste del comportamiento de pensamiento

Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

Control de costos

Trabajar con bloques de pensamiento

Los siguientes conceptos se aplican a todos los modelos que admiten pensamiento extendido, independientemente de si usas modo adaptativo o manual.

Pensamiento resumido

Here are some important considerations for summarized thinking:

You're charged for the full thinking tokens generated by the original request, not the summary tokens.
The billed output token count will not match the count of tokens you see in the response.
The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.
As Anthropic seeks to improve the extended thinking feature, summarization behavior is subject to change.
Summarization preserves the key ideas of Claude's thinking process with minimal added latency, enabling a streamable user experience and easy migration from Claude Sonnet 3.7 to Claude 4 and later models.
Summarization is processed by a different model than the one you target in your requests. The thinking model does not see the summarized output.

Claude Sonnet 3.7 continues to return full thinking output.

In rare cases where you need access to full thinking output for Claude 4 models, contact our sales team.

Encriptación de pensamiento

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API.

If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

Here are some important considerations on thinking encryption:

When streaming responses, the signature is added via a signature_delta inside a content_block_delta event just before the content_block_stop event.
signature values are significantly longer in Claude 4 models than in previous models.
The signature field is an opaque field and should not be interpreted or parsed - it exists solely for verification purposes.
signature values are compatible across platforms (Claude APIs, Amazon Bedrock, and Vertex AI). Values generated on one platform will be compatible with another.

Redacción de pensamiento

When building customer-facing applications that use extended thinking:

Be aware that redacted thinking blocks contain encrypted content that isn't human-readable
Consider providing a simple explanation like: "Some of Claude's internal reasoning has been automatically encrypted for safety reasons. This doesn't affect the quality of responses."
If showing thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks
Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted
Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI

Here's an example showing both normal and redacted thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.

Precios

For complete pricing information including base rates, cache writes, cache hits, and output tokens, see the pricing page.

The thinking process incurs charges for:

Tokens used during thinking (output tokens)
Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
Standard text output tokens

When extended thinking is enabled, a specialized system prompt is automatically included to support this feature.

When using summarized thinking:

Input tokens: Tokens in your original request (excludes thinking tokens from previous turns)
Output tokens (billed): The original thinking tokens that Claude generated internally
Output tokens (visible): The summarized thinking tokens you see in the response
No charge: Tokens used to generate the summary

The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

Temas adicionales

La página de pensamiento extendido cubre varios temas con más detalle con ejemplos de código específicos del modo:

Uso de herramientas con pensamiento: Las mismas reglas se aplican para el pensamiento adaptativo — preserva bloques de pensamiento entre llamadas de herramientas y ten en cuenta las limitaciones de tool_choice cuando el pensamiento está activo.
Almacenamiento en caché de prompts: Con pensamiento adaptativo, las solicitudes consecutivas que usan el mismo modo de pensamiento preservan los puntos de ruptura de caché. Cambiar entre modos adaptive y enabled/disabled rompe los puntos de ruptura de caché para mensajes (los prompts del sistema y las definiciones de herramientas permanecen en caché).
Ventanas de contexto: Cómo los tokens de pensamiento interactúan con max_tokens y los límites de la ventana de contexto.

Próximos pasos

Pensamiento extendido

Aprende más sobre pensamiento extendido, incluyendo modo manual, uso de herramientas y almacenamiento en caché de prompts.

Parámetro effort

Controla cuán minuciosamente responde Claude con el parámetro effort.

Was this page helpful?