Claude 4.6 represents the next generation of Claude models, bringing significant new capabilities and API improvements. This page summarizes all new features available at launch.
| Model | API model ID | Description |
|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6 | Our most intelligent model for building agents and coding |
Claude Opus 4.6 supports a 200K context window (with 1M token context window available in beta), 128K max output tokens, extended thinking, and all existing Claude API features.
For complete pricing and specs, see the models overview.
Adaptive thinking (thinking: {type: "adaptive"}) is the recommended thinking mode for Opus 4.6. Claude dynamically decides when and how much to think. At the default effort level (high), Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems.
thinking: {type: "enabled"} and budget_tokens are deprecated on Opus 4.6. They remain functional but will be removed in a future model release. Use adaptive thinking and the effort parameter to control thinking depth instead. Adaptive thinking also automatically enables interleaved thinking.
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": "Solve this complex problem..."}]
)The effort parameter is now generally available (no beta header required). A new max effort level provides the absolute highest capability on Opus 4.6. Combine effort with adaptive thinking for optimal cost-quality tradeoffs.
Compaction provides automatic, server-side context summarization, enabling effectively infinite conversations. When context approaches the window limit, the API automatically summarizes earlier parts of the conversation.
Fine-grained tool streaming is now generally available on all models and platforms. No beta header is required.
Opus 4.6 supports up to 128K output tokens, doubling the previous 64K limit. This enables longer thinking budgets and more comprehensive responses. The SDKs require streaming for requests with large max_tokens values to avoid HTTP timeouts. If you don't need to process events incrementally, use .stream() with .get_final_message() to get the complete response — see Streaming Messages for details.
Data residency controls allow you to specify where model inference runs using the inference_geo parameter. You can choose "global" (default) or "us" routing per request. US-only inference is priced at 1.1x on Claude Opus 4.6 and newer models.
type: "enabled" and budget_tokensthinking: {type: "enabled", budget_tokens: N} is deprecated on Opus 4.6. It remains functional but will be removed in a future model release. Migrate to thinking: {type: "adaptive"} with the effort parameter.
interleaved-thinking-2025-05-14 beta headerThe interleaved-thinking-2025-05-14 beta header is deprecated on Opus 4.6. It is safely ignored if included, but is no longer required. Adaptive thinking automatically enables interleaved thinking. Remove betas=["interleaved-thinking-2025-05-14"] from your requests when using Opus 4.6.
output_formatThe output_format parameter for structured outputs has been moved to output_config.format. The old parameter remains functional but is deprecated and will be removed in a future model release.
# Before
response = client.messages.create(
output_format={"type": "json_schema", "schema": {...}},
...
)
# After
response = client.messages.create(
output_config={"format": {"type": "json_schema", "schema": {...}}},
...
)Prefilling assistant messages (last-assistant-turn prefills) is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error.
Alternatives:
output_config.format for JSON outputOpus 4.6 may produce slightly different JSON string escaping in tool call arguments (e.g., different handling of Unicode escapes or forward slash escaping). Standard JSON parsers handle these differences automatically. If you parse tool call input as a raw string rather than using json.loads() or JSON.parse(), verify your parsing logic still works.
For step-by-step migration instructions, see Migrating to Claude 4.6.
Learn how to use adaptive thinking mode.
Compare all Claude models.
Explore server-side context compaction.
Step-by-step migration instructions.
Was this page helpful?