Claude 4.6 represents the next generation of Claude models, bringing significant new capabilities and API improvements. This page summarizes all new features available at launch.
| Model | API model ID | Description |
|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6 | Our most intelligent model for building agents and coding |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | Our best combination of speed and intelligence |
Claude Opus 4.6 supports a 200K context window (with 1M token context window available in beta), 128K max output tokens, extended thinking, and all existing Claude API features.
Claude Sonnet 4.6 supports a 200K context window (with 1M token context window available in beta), 64K max output tokens, extended thinking, and adaptive thinking.
For complete pricing and specs, see the models overview.
Adaptive thinking (thinking: {type: "adaptive"}) is the recommended thinking mode for Opus 4.6 and Sonnet 4.6. Claude dynamically decides when and how much to think. At the default effort level (high), Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems.
thinking: {type: "enabled"} and budget_tokens are deprecated on Opus 4.6 and Sonnet 4.6. They remain functional but will be removed in a future model release. Use adaptive thinking and the effort parameter to control thinking depth instead. Adaptive thinking also automatically enables interleaved thinking.
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": "Solve this complex problem..."}],
)The effort parameter is now generally available (no beta header required). A new max effort level provides the absolute highest capability on Opus 4.6. Combine effort with adaptive thinking for optimal cost-quality tradeoffs.
Sonnet 4.6 introduces the effort parameter to the Sonnet family. We recommend setting effort to medium for most Sonnet 4.6 use cases to balance speed, cost, and performance.
Code execution is now free when used with web search or web fetch. When either tool is included in your API request, there are no additional charges for code execution beyond standard input and output token costs. Code execution enables dynamic filtering in web search and web fetch tools, improving accuracy while reducing token consumption. See the code execution pricing for details on standalone usage.
Web search and web fetch tools now support dynamic filtering in public beta with Opus 4.6 and Sonnet 4.6. Claude can write and execute code to filter results before they reach the context window, keeping only relevant information and improving accuracy while reducing token consumption. To enable dynamic filtering, use the web_search_20260209 or web_fetch_20260209 tool versions with the code-execution-web-tools-2026-02-09 beta header.
The following tools are now generally available:
Compaction provides automatic, server-side context summarization, enabling effectively infinite conversations. When context approaches the window limit, the API automatically summarizes earlier parts of the conversation.
Fast mode (speed: "fast") delivers significantly faster output token generation for Opus models. Fast mode is up to 2.5x as fast at premium pricing ($30/$150 per MTok). This is the same model running with faster inference (no change to intelligence or capabilities).
response = client.beta.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
speed="fast",
betas=["fast-mode-2026-02-01"],
messages=[{"role": "user", "content": "Refactor this module..."}],
)Fine-grained tool streaming is now generally available on all models and platforms. No beta header is required.
Opus 4.6 supports up to 128K output tokens, doubling the previous 64K limit. This enables longer thinking budgets and more comprehensive responses. The SDKs require streaming for requests with large max_tokens values to avoid HTTP timeouts. If you don't need to process events incrementally, use .stream() with .get_final_message() to get the complete response. See Streaming Messages for details.
Data residency controls allow you to specify where model inference runs using the inference_geo parameter. You can choose "global" (default) or "us" routing per request. US-only inference is priced at 1.1x on Claude Opus 4.6 and newer models.
type: "enabled" and budget_tokensthinking: {type: "enabled", budget_tokens: N} is deprecated on Opus 4.6. It remains functional but will be removed in a future model release. Migrate to thinking: {type: "adaptive"} with the effort parameter.
interleaved-thinking-2025-05-14 beta headerThe interleaved-thinking-2025-05-14 beta header is deprecated on Opus 4.6. It is safely ignored if included, but is no longer required. Adaptive thinking automatically enables interleaved thinking. Remove betas=["interleaved-thinking-2025-05-14"] from your requests when using Opus 4.6.
Sonnet 4.6 continues to support the interleaved-thinking-2025-05-14 beta header for use with manual extended thinking (thinking: {type: "enabled"}). You can use either interleaved thinking with the beta header or adaptive thinking on Sonnet 4.6.
output_formatThe output_format parameter for structured outputs has been moved to output_config.format. The old parameter remains functional but is deprecated and will be removed in a future model release.
# Before
response = client.messages.create(
output_format={"type": "json_schema", "schema": {...}},
# ...
)
# After
response = client.messages.create(
output_config={"format": {"type": "json_schema", "schema": {...}}},
# ...
)Prefilling assistant messages (last-assistant-turn prefills) is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error.
Alternatives:
output_config.format for JSON outputOpus 4.6 may produce slightly different JSON string escaping in tool call arguments (e.g., different handling of Unicode escapes or forward slash escaping). Standard JSON parsers handle these differences automatically. If you parse tool call input as a raw string rather than using json.loads() or JSON.parse(), verify your parsing logic still works.
For step-by-step migration instructions, see Migrating to Claude 4.6.
Learn how to use adaptive thinking mode.
Compare all Claude models.
Explore server-side context compaction.
Faster output token generation for Opus models.
Step-by-step migration instructions.
Was this page helpful?