The effort parameter allows you to control how eager Claude is about spending tokens when responding to requests. This gives you the ability to trade off between response thoroughness and token efficiency, all with a single model. The effort parameter is generally available on all supported models with no beta header required.
The effort parameter is supported by Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best experience. While budget_tokens is still accepted on Opus 4.6 and Sonnet 4.6, it is deprecated and will be removed in a future model release. At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems.
By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise the effort level to max for the absolute highest capability, or lower it to be more conservative with token usage, optimizing for speed and cost while accepting some reduction in capability.
Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.
The effort parameter affects all tokens in the response, including:
This approach has two major advantages:
| Level | Description | Typical use case |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending. Opus 4.6 only. Requests using max on other models will return an error. | Tasks requiring the deepest possible reasoning and most thorough analysis |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings. | Agentic tasks that require a balance of speed, cost, and performance |
low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks that need the best speed and lowest costs, such as subagents |
Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem.
Sonnet 4.6 defaults to high effort. Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures",
}
],
output_config={"effort": "medium"},
)
print(response.content[0].text)max on other models will return an error.When using tools, the effort parameter affects both the explanations around tool calls and the tool calls themselves. Lower effort levels tend to:
Higher effort levels may:
The effort parameter works alongside extended thinking. Its behavior depends on the model:
thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. While budget_tokens is still accepted on Opus 4.6, it is deprecated and will be removed in a future release. At high and max effort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems.thinking: {type: "enabled", budget_tokens: N}).thinking: {type: "enabled", budget_tokens: N}), where effort works alongside the thinking token budget. Set the effort level for your task, then set the thinking token budget based on task complexity.The effort parameter can be used with or without extended thinking enabled. When used without thinking, it still controls overall token spend for text responses and tool calls.
Was this page helpful?