This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
The effort parameter allows you to control how eager Claude is about spending tokens when responding to requests. This gives you the ability to trade off between response thoroughness and token efficiency, all with a single model. The effort parameter is available on all supported models with no beta header required.
The effort parameter is supported by Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best experience. While budget_tokens is still accepted on Opus 4.6 and Sonnet 4.6, it is deprecated and will be removed in a future model release. At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems.
By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise the effort level to max for the absolute highest capability, or lower it to be more conservative with token usage, optimizing for speed and cost while accepting some reduction in capability.
Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.
The effort parameter affects all tokens in the response, including:
This approach has two major advantages:
| Level | Description | Typical use case |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending. Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. | Tasks requiring the deepest possible reasoning and most thorough analysis |
xhigh | Extended capability for long-horizon work. Available on Claude Opus 4.7. | Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings. | Agentic tasks that require a balance of speed, cost, and performance |
low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks that need the best speed and lowest costs, such as subagents |
Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem.
Sonnet 4.6 defaults to high effort. Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency:
Start with xhigh for coding and agentic use cases, and use high as the minimum for most intelligence-sensitive workloads. Step down to medium for cost-sensitive workloads, or up to max only when your evals show measurable headroom at xhigh.
The API default is high. To use xhigh, set effort explicitly; the value you pass overrides the default.
| Effort | Guidance for Claude Opus 4.7 |
|---|---|
low | Efficient, but best for short, scoped tasks. Pair low with explicit checklists if your task has multiple sections. |
medium | The drop-in for the average workflow where you want good results while reducing costs. |
high | Advanced use cases that still need a balance of intelligence and token consumption. This is often the sweet spot balancing quality and token efficiency. |
xhigh | The recommended starting point for coding and agentic work, and for exploratory tasks such as repeated tool calling, detailed web search, and knowledge-base search. Expect meaningfully higher token usage than high. |
max | Reserve for genuinely frontier problems. On most workloads max adds significant cost for relatively small quality gains, and on some structured-output or less intelligence-sensitive tasks it can lead to overthinking. |
Claude Opus 4.7 also respects effort levels more strictly than Claude Opus 4.6, especially at low and medium. At lower effort levels, the model scopes its work to what was asked rather than going above and beyond. If you observe shallow reasoning on complex problems with Claude Opus 4.7, raise effort rather than prompting around it. If you must keep effort low for latency, add targeted guidance like "This task involves multi-step reasoning. Think carefully before responding."
When running Claude Opus 4.7 at xhigh or max effort, set a large max_tokens so the model has room to think and act across subagents and tool calls. Starting at 64k tokens and tuning from there is a reasonable default.
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures",
}
],
output_config={"effort": "medium"},
)
print(response.content[0].text)When using tools, the effort parameter affects both the explanations around tool calls and the tool calls themselves. Lower effort levels tend to:
Higher effort levels may:
The effort parameter works alongside extended thinking. Its behavior depends on the model:
thinking configuration required). thinking: {type: "disabled"} is rejected. Effort controls thinking depth the same way as on Opus 4.7 and Opus 4.6.thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is no longer supported on Opus 4.7; use adaptive thinking with effort instead. At high, xhigh, and max effort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems.thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. While budget_tokens is still accepted on Opus 4.6, it is deprecated and will be removed in a future release. At high and max effort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems.thinking: {type: "enabled", budget_tokens: N}) is still functional but deprecated.thinking: {type: "enabled", budget_tokens: N}), where effort works alongside the thinking token budget. Set the effort level for your task, then set the thinking token budget based on task complexity.The effort parameter can be used with or without extended thinking enabled. When used without thinking, it still controls overall token spend for text responses and tool calls.
high, but the right starting point depends on your model and workload.Was this page helpful?