顾问工具

构建工具

顾问工具

将更快的执行器模型与更高智能的顾问模型配对，在生成过程中提供战略指导。

顾问工具让更快、成本更低的执行器模型在生成过程中咨询更高智能的顾问模型以获得战略指导。顾问读取完整的对话，生成计划或课程纠正（通常为 400 到 700 个文本令牌，包括思考在内共 1,400 到 1,800 个令牌），然后执行器继续执行任务。

这种模式适合长期视野的代理工作负载（编码代理、计算机使用、多步骤研究管道），其中大多数轮次是机械性的，但拥有一个优秀的计划至关重要。您可以获得接近仅顾问质量的结果，同时大部分令牌生成以执行器模型的速率进行。

顾问工具处于测试版。在您的请求中包含测试版标头 advisor-tool-2026-03-01。要请求访问权限或分享反馈，请联系您的 Anthropic 账户团队。

This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.

何时使用

早期基准测试显示这些配置有显著收益：

您目前在复杂任务中使用 Sonnet： 添加 Opus 作为顾问以获得质量提升，成本相同或更低。
您目前使用 Haiku 并希望提高智能水平： 添加 Opus 作为顾问。预期成本高于仅使用 Haiku，但低于将执行器切换到更大的模型。

结果取决于任务。在您自己的工作负载上进行评估。

顾问对于单轮问答（没有什么可计划的）、纯粹的传递模型选择器（您的用户已经选择自己的成本和质量权衡）或每轮都真正需要顾问模型全部能力的工作负载来说不太适合。

模型兼容性

执行器模型（顶级 model 字段）和顾问模型（工具定义内的 model 字段）必须形成有效的配对。顾问的能力必须至少与执行器相同。

执行器模型	顾问模型
Claude Haiku 4.5 (`claude-haiku-4-5-20251001`)	Claude Opus 4.7 (`claude-opus-4-7`)
Claude Sonnet 4.6 (`claude-sonnet-4-6`)	Claude Opus 4.7 (`claude-opus-4-7`)
Claude Opus 4.6 (`claude-opus-4-6`)	Claude Opus 4.7 (`claude-opus-4-7`)
Claude Opus 4.7 (`claude-opus-4-7`)	Claude Opus 4.7 (`claude-opus-4-7`)

如果您请求无效的配对，API 将返回 400 invalid_request_error，命名不支持的组合。

平台可用性

顾问工具在 Claude API (Anthropic) 上以测试版形式提供。

快速开始

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-7",
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Build a concurrent worker pool in Go with graceful shutdown.",
        }
    ],
)

print(response)

工作原理

当您将顾问工具添加到 tools 数组时，执行器模型决定何时调用它，就像任何其他工具一样。当执行器调用顾问时：

执行器发出一个 server_tool_use 块，其中 name: "advisor" 和空的 input。执行器发出信号；服务器提供上下文。
Anthropic 在顾问模型服务器端运行单独的推理传递，传递执行器的完整记录。顾问看到系统提示、所有工具定义、所有先前的轮次和所有先前的工具结果。
顾问的响应作为 advisor_tool_result 块返回给执行器。
执行器继续生成，受到建议的指导。

所有这些都发生在单个 /v1/messages 请求内。您这边没有额外的往返。

顾问本身运行时没有工具，也没有上下文管理。其思考块在结果返回前被删除；只有建议文本到达执行器。

工具参数

参数	类型	默认值	描述
`type`	string	必需	必须是 `"advisor_20260301"`。
`name`	string	必需	必须是 `"advisor"`。
`model`	string	必需	顾问模型 ID，例如 `"claude-opus-4-7"`。按此模型的费率为子推理计费。
`max_uses`	integer	无限制	单个请求中允许的最大顾问调用次数。一旦执行器达到此上限，进一步的顾问调用将返回 `advisor_tool_result_error`，其中 `error_code: "max_uses_exceeded"`，执行器继续而不再获得建议。这是每个请求的上限，不是每个对话的上限；请参阅成本控制了解对话级别的限制。
`caching`	object \| null	`null` (off)	为顾问在对话中的调用之间启用对其自己的记录的提示缓存。请参阅顾问提示缓存。

caching 对象的形状为 {"type": "ephemeral", "ttl": "5m" | "1h"}。与内容块上的 cache_control 不同，这不是断点标记；它是一个开/关开关。服务器决定缓存边界的位置。

响应结构

成功的顾问调用

当调用顾问时，server_tool_use 块后跟助手内容中的 advisor_tool_result 块：

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Let me consult the advisor on this."
    },
    {
      "type": "server_tool_use",
      "id": "srvtoolu_abc123",
      "name": "advisor",
      "input": {}
    },
    {
      "type": "advisor_tool_result",
      "tool_use_id": "srvtoolu_abc123",
      "content": {
        "type": "advisor_result",
        "text": "Use a channel-based coordination pattern. The tricky part is draining in-flight work during shutdown: close the input channel first, then wait on a WaitGroup..."
      }
    },
    {
      "type": "text",
      "text": "Here's the implementation. I'm using a channel-based coordination pattern to avoid writer starvation..."
    }
  ]
}

server_tool_use.input 始终为空。服务器从完整的记录自动构造顾问的视图；执行器在 input 中放入的任何内容都不会到达顾问。

结果变体

advisor_tool_result.content 字段是一个判别联合。您收到哪个变体取决于顾问模型：

变体	字段	返回时机
`advisor_result`	`text`	顾问模型返回纯文本（例如，Claude Opus 4.7）。
`advisor_redacted_result`	`encrypted_content`	顾问模型返回加密输出。

使用 advisor_result 时，text 字段包含人类可读的建议。使用 advisor_redacted_result 时，encrypted_content 字段包含您无法读取的不透明 blob；在下一轮，服务器解密它并将纯文本呈现到执行器的提示中。

在这两种情况下，在后续轮次中逐字往返内容。如果您在对话中途切换顾问模型，请根据 content.type 分支以处理两种形状。

错误结果

如果顾问调用失败，结果会携带错误：

{
  "type": "advisor_tool_result",
  "tool_use_id": "srvtoolu_abc123",
  "content": {
    "type": "advisor_tool_result_error",
    "error_code": "overloaded"
  }
}

执行器看到错误并继续而不再获得建议。请求本身不会失败。

`error_code`	含义
`max_uses_exceeded`	请求达到了工具定义上设置的 `max_uses` 上限。同一请求中的进一步顾问调用返回此错误。
`too_many_requests`	顾问子推理被速率限制。
`overloaded`	顾问子推理达到容量限制。
`prompt_too_long`	记录超过了顾问模型的上下文窗口。
`execution_time_exceeded`	顾问子推理超时。
`unavailable`	任何其他顾问故障。

顾问速率限制来自与顾问模型直接调用相同的每个模型桶。顾问上的速率限制在工具结果内显示为 too_many_requests；执行器上的速率限制使整个请求失败，HTTP 429。

多轮对话

在后续轮次中将完整的助手内容（包括 advisor_tool_result 块）传回 API：

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-7",
    }
]

messages = [
    {
        "role": "user",
        "content": "Build a concurrent worker pool in Go with graceful shutdown.",
    }
]

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=tools,
    messages=messages,
)

# Append the full response content, including any advisor_tool_result blocks
messages.append({"role": "assistant", "content": response.content})

# Continue the conversation
messages.append({"role": "user", "content": "Now add a max-in-flight limit of 10."})

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=tools,
    messages=messages,
)

如果您在后续轮次中从 tools 中省略顾问工具，而消息历史仍然包含 advisor_tool_result 块，API 将返回 400 invalid_request_error。

顾问工具没有内置的对话级别上限。要限制整个对话中的顾问调用，请在客户端计数。当您达到上限时，从 tools 数组中删除顾问工具并从消息历史中删除所有 advisor_tool_result 块，以避免 400 invalid_request_error。

流式传输

顾问子推理不流式传输。执行器的流在顾问运行时暂停，然后完整结果在单个事件中到达。

带有 name: "advisor" 的 server_tool_use 块表示顾问调用正在开始。暂停在该块关闭时开始（content_block_stop）。在暂停期间，流是安静的，除了大约每 30 秒发出的标准 SSE ping 保活；短顾问调用可能不显示任何 ping。

当顾问完成时，advisor_tool_result 在单个 content_block_start 事件中完整形成到达（无增量）。执行器输出然后恢复流式传输。

随后的 message_delta 事件带有更新的 usage.iterations 数组，反映顾问的令牌计数。

使用和计费

顾问调用作为单独的子推理运行，按顾问模型的费率计费。使用情况在 usage.iterations[] 数组中报告：

{
  "usage": {
    "input_tokens": 412,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0,
    "output_tokens": 531,
    "iterations": [
      {
        "type": "message",
        "input_tokens": 412,
        "cache_read_input_tokens": 0,
        "cache_creation_input_tokens": 0,
        "output_tokens": 89
      },
      {
        "type": "advisor_message",
        "model": "claude-opus-4-7",
        "input_tokens": 823,
        "cache_read_input_tokens": 0,
        "cache_creation_input_tokens": 0,
        "output_tokens": 1612
      },
      {
        "type": "message",
        "input_tokens": 1348,
        "cache_read_input_tokens": 412,
        "cache_creation_input_tokens": 0,
        "output_tokens": 442
      }
    ]
  }
}

顶级 usage 字段仅反映执行器令牌。顾问令牌不汇总到顶级总数中，因为它们按不同的费率计费。type: "advisor_message" 的迭代按顾问模型的费率计费；type: "message" 的迭代按执行器模型的费率计费。

聚合规则因字段而异。顶级 output_tokens 是所有执行器迭代的总和。顶级 input_tokens 和 cache_read_input_tokens 仅反映第一个执行器迭代；后续执行器迭代的输入不被重新求和，因为它们包括先前的输出令牌。在构建成本跟踪逻辑时，使用 usage.iterations 获得完整的每次迭代细分。

顾问输出通常为 400 到 700 个文本令牌，或包括思考在内共 1,400 到 1,800 个令牌。成本节省来自顾问不生成您的完整最终输出；执行器以其较低的费率执行此操作。

顶级 max_tokens 仅适用于执行器输出。它不限制顾问子推理令牌。顾问的令牌也不从应用于执行器的任何任务预算中提取。

顾问提示缓存

有两个独立的缓存层。

执行器端缓存

advisor_tool_result 块像任何其他内容块一样可缓存。在后续轮次中放置在其后的 cache_control 断点将命中。执行器的提示始终包含纯文本建议，无论您的客户端收到 text 还是 encrypted_content，因此两种结果变体的缓存行为相同。

顾问端缓存

在工具定义上设置 caching 以在同一对话中的调用之间为顾问自己的记录启用提示缓存：

tools = [
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-7",
        "caching": {"type": "ephemeral", "ttl": "5m"},
    }
]

第 N 次调用时顾问的提示是第 (N-1) 次调用的提示，附加了一个更多的段，因此前缀在调用之间是稳定的。启用 caching 后，每个顾问调用写入缓存条目；下一次调用读取到该点并仅为增量付费。您将看到 cache_read_input_tokens 在第二个和后续 advisor_message 迭代中变为非零。

何时启用它： 当顾问在每个对话中被调用两次或更少次时，缓存写入成本超过读取节省。缓存在大约三个顾问调用时达到平衡，从那里开始改进。为长代理循环启用它；对于短任务保持关闭。

保持一致： 设置 caching 一次并为整个对话保持它。在对话中途切换它关闭和打开会导致缓存未命中。

clear_thinking 带有 keep 值（不是 "all"）会改变顾问的引用记录每轮，导致顾问端缓存未命中。这仅是成本降级；建议质量不受影响。启用扩展思考而不显式 clear_thinking 配置时，API 默认为 keep: {type: "thinking_turns", value: 1}，这会触发此行为。设置 keep: "all" 以保持顾问缓存稳定性。

与其他工具结合

顾问工具与其他服务器端和客户端工具组合。将它们全部添加到同一 tools 数组：

tools = [
    {
        "type": "web_search_20250305",
        "name": "web_search",
        "max_uses": 5,
    },
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-7",
    },
    {
        "name": "run_bash",
        "description": "Run a bash command",
        "input_schema": {
            "type": "object",
            "properties": {"command": {"type": "string"}},
        },
    },
]

执行器可以在同一轮中搜索网络、调用顾问并使用您的自定义工具。顾问的计划可以告知执行器接下来要使用哪些工具。

功能	交互
批处理	支持。`usage.iterations` 按项目报告。
令牌计数	仅返回执行器的第一次迭代输入令牌。对于粗略的顾问估计，使用 `model` 设置为顾问模型和相同消息的 `count_tokens` 调用。
上下文编辑	`clear_tool_uses` 尚未与顾问工具块完全兼容；计划在后续版本中提供完全支持。使用 `clear_thinking` 时，请参阅上面的缓存警告。
`pause_turn`	悬挂的顾问调用以 `stop_reason: "pause_turn"` 和 `server_tool_use` 块作为最后一个内容块结束响应。顾问在恢复时执行。请参阅服务器工具。

最佳实践

编码和代理任务的提示

顾问工具附带内置描述，促使执行器在复杂任务开始时调用它，以及当它遇到困难时。对于研究任务，通常不需要额外的提示。

在编码和代理任务上，当顾问减少总工具调用和对话长度时，它以相似的成本产生更高的智能。两个时机驱动这种改进：

早期的第一个顾问调用，在记录中进行了几次探索性读取之后。
对于困难的任务，在文件写入和测试输出在记录中之后的最后顾问调用。

如果您的代理公开其他规划者类工具（例如，待办事项列表工具），提示模型在这些工具之前调用顾问，以便顾问的计划流入它们。下面建议的系统提示强化了早期调用模式；添加您自己的漏斗句子，指向您的代理公开的任何规划者工具。

编码任务的建议系统提示

对于您希望一致的顾问时机和每个任务大约两到三次调用的编码任务，在任何其他提及顾问的句子之前，将以下块前置到执行器系统提示。在内部编码评估中，这种模式以接近 Sonnet 的成本产生了最高的智能。

时机指导：

You have access to an `advisor` tool backed by a stronger reviewer model. It takes NO parameters — when you call advisor(), your entire conversation history is automatically forwarded. They see the task, every tool call you've made, every result you've seen.

Call advisor BEFORE substantive work — before writing, before committing to an interpretation, before building on an assumption. If the task requires orientation first (finding files, fetching a source, seeing what's there), do that, then call advisor. Orientation is not substantive work. Writing, editing, and declaring an answer are.

Also call advisor:
- When you believe the task is complete. BEFORE this call, make your deliverable durable: write the file, save the result, commit the change. The advisor call takes time; if the session ends during it, a durable result persists and an unwritten one doesn't.
- When stuck — errors recurring, approach not converging, results that don't fit.
- When considering a change of approach.

On tasks longer than a few steps, call advisor at least once before committing to an approach and once before declaring done. On short reactive tasks where the next action is dictated by tool output you just read, you don't need to keep calling — the advisor adds most of its value on the first call, before the approach crystallizes.

执行器应如何对待建议（直接放在时机块之后）：

Give the advice serious weight. If you follow a step and it fails empirically, or you have primary-source evidence that contradicts a specific claim (the file says X, the paper states Y), adapt. A passing self-test is not evidence the advice is wrong — it's evidence your test doesn't check what the advice is checking.

If you've already retrieved data pointing one way and the advisor points another: don't silently switch. Surface the conflict in one more advisor call — "I found X, you suggest Y, which constraint breaks the tie?" The advisor saw your evidence but may have underweighted it; a reconcile call is cheaper than committing to the wrong branch.

修剪顾问输出长度

顾问输出是顾问最大的成本驱动因素。要减少该成本，在任何其他提及顾问的句子之前，将单个简洁指令前置到系统提示。在内部测试中，以下行减少了总顾问输出令牌大约 35 到 45 百分比，而不改变调用频率：

The advisor should respond in under 100 words and use enumerated steps, not explanations.

将此与上面的时机块配对以获得最强的成本与质量权衡。

与努力设置配对

对于编码任务，将中等努力的 Sonnet 执行器与 Opus 顾问配对可实现与默认努力的 Sonnet 相当的智能，成本更低。为了获得最大智能，保持执行器处于默认努力。

成本控制

对于对话级别的预算，在客户端计数顾问调用。当您达到上限时，从 tools 中删除顾问工具并从消息历史中删除所有 advisor_tool_result 块，以避免 400 invalid_request_error。
仅对您期望三个或更多顾问调用的对话启用 caching。

限制

顾问输出不流式传输。 在子推理运行时预期流中的暂停。
顾问调用没有内置的对话级别上限。 在客户端跟踪和限制它们。
max_tokens 仅适用于执行器输出。 它不限制顾问令牌。
Anthropic Priority Tier 按模型受尊重。执行器模型上的 Priority Tier 不扩展到顾问；您需要在顾问模型上特别拥有 Priority Tier。

Was this page helpful?

构建工具

顾问工具

将更快的执行器模型与更高智能的顾问模型配对，在生成过程中提供战略指导。

顾问工具处于测试版。在您的请求中包含测试版标头 advisor-tool-2026-03-01。要请求访问权限或分享反馈，请联系您的 Anthropic 账户团队。

This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.

何时使用

早期基准测试显示这些配置有显著收益：

您目前在复杂任务中使用 Sonnet： 添加 Opus 作为顾问以获得质量提升，成本相同或更低。
您目前使用 Haiku 并希望提高智能水平： 添加 Opus 作为顾问。预期成本高于仅使用 Haiku，但低于将执行器切换到更大的模型。

结果取决于任务。在您自己的工作负载上进行评估。

模型兼容性

执行器模型（顶级 model 字段）和顾问模型（工具定义内的 model 字段）必须形成有效的配对。顾问的能力必须至少与执行器相同。

执行器模型	顾问模型
Claude Haiku 4.5 (`claude-haiku-4-5-20251001`)	Claude Opus 4.7 (`claude-opus-4-7`)
Claude Sonnet 4.6 (`claude-sonnet-4-6`)	Claude Opus 4.7 (`claude-opus-4-7`)
Claude Opus 4.6 (`claude-opus-4-6`)	Claude Opus 4.7 (`claude-opus-4-7`)
Claude Opus 4.7 (`claude-opus-4-7`)	Claude Opus 4.7 (`claude-opus-4-7`)

如果您请求无效的配对，API 将返回 400 invalid_request_error，命名不支持的组合。

平台可用性

顾问工具在 Claude API (Anthropic) 上以测试版形式提供。

快速开始

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-7",
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Build a concurrent worker pool in Go with graceful shutdown.",
        }
    ],
)

print(response)

工作原理

当您将顾问工具添加到 tools 数组时，执行器模型决定何时调用它，就像任何其他工具一样。当执行器调用顾问时：

执行器发出一个 server_tool_use 块，其中 name: "advisor" 和空的 input。执行器发出信号；服务器提供上下文。
Anthropic 在顾问模型服务器端运行单独的推理传递，传递执行器的完整记录。顾问看到系统提示、所有工具定义、所有先前的轮次和所有先前的工具结果。
顾问的响应作为 advisor_tool_result 块返回给执行器。
执行器继续生成，受到建议的指导。

所有这些都发生在单个 /v1/messages 请求内。您这边没有额外的往返。

顾问本身运行时没有工具，也没有上下文管理。其思考块在结果返回前被删除；只有建议文本到达执行器。

工具参数

参数	类型	默认值	描述
`type`	string	必需	必须是 `"advisor_20260301"`。
`name`	string	必需	必须是 `"advisor"`。
`model`	string	必需	顾问模型 ID，例如 `"claude-opus-4-7"`。按此模型的费率为子推理计费。
`max_uses`	integer	无限制	单个请求中允许的最大顾问调用次数。一旦执行器达到此上限，进一步的顾问调用将返回 `advisor_tool_result_error`，其中 `error_code: "max_uses_exceeded"`，执行器继续而不再获得建议。这是每个请求的上限，不是每个对话的上限；请参阅成本控制了解对话级别的限制。
`caching`	object \| null	`null` (off)	为顾问在对话中的调用之间启用对其自己的记录的提示缓存。请参阅顾问提示缓存。

响应结构

成功的顾问调用

当调用顾问时，server_tool_use 块后跟助手内容中的 advisor_tool_result 块：

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Let me consult the advisor on this."
    },
    {
      "type": "server_tool_use",
      "id": "srvtoolu_abc123",
      "name": "advisor",
      "input": {}
    },
    {
      "type": "advisor_tool_result",
      "tool_use_id": "srvtoolu_abc123",
      "content": {
        "type": "advisor_result",
        "text": "Use a channel-based coordination pattern. The tricky part is draining in-flight work during shutdown: close the input channel first, then wait on a WaitGroup..."
      }
    },
    {
      "type": "text",
      "text": "Here's the implementation. I'm using a channel-based coordination pattern to avoid writer starvation..."
    }
  ]
}

server_tool_use.input 始终为空。服务器从完整的记录自动构造顾问的视图；执行器在 input 中放入的任何内容都不会到达顾问。

结果变体

advisor_tool_result.content 字段是一个判别联合。您收到哪个变体取决于顾问模型：

变体	字段	返回时机
`advisor_result`	`text`	顾问模型返回纯文本（例如，Claude Opus 4.7）。
`advisor_redacted_result`	`encrypted_content`	顾问模型返回加密输出。

在这两种情况下，在后续轮次中逐字往返内容。如果您在对话中途切换顾问模型，请根据 content.type 分支以处理两种形状。

错误结果

如果顾问调用失败，结果会携带错误：

{
  "type": "advisor_tool_result",
  "tool_use_id": "srvtoolu_abc123",
  "content": {
    "type": "advisor_tool_result_error",
    "error_code": "overloaded"
  }
}

执行器看到错误并继续而不再获得建议。请求本身不会失败。

`error_code`	含义
`max_uses_exceeded`	请求达到了工具定义上设置的 `max_uses` 上限。同一请求中的进一步顾问调用返回此错误。
`too_many_requests`	顾问子推理被速率限制。
`overloaded`	顾问子推理达到容量限制。
`prompt_too_long`	记录超过了顾问模型的上下文窗口。
`execution_time_exceeded`	顾问子推理超时。
`unavailable`	任何其他顾问故障。

多轮对话

在后续轮次中将完整的助手内容（包括 advisor_tool_result 块）传回 API：

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-7",
    }
]

messages = [
    {
        "role": "user",
        "content": "Build a concurrent worker pool in Go with graceful shutdown.",
    }
]

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=tools,
    messages=messages,
)

# Append the full response content, including any advisor_tool_result blocks
messages.append({"role": "assistant", "content": response.content})

# Continue the conversation
messages.append({"role": "user", "content": "Now add a max-in-flight limit of 10."})

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=tools,
    messages=messages,
)

如果您在后续轮次中从 tools 中省略顾问工具，而消息历史仍然包含 advisor_tool_result 块，API 将返回 400 invalid_request_error。

流式传输

顾问子推理不流式传输。执行器的流在顾问运行时暂停，然后完整结果在单个事件中到达。

当顾问完成时，advisor_tool_result 在单个 content_block_start 事件中完整形成到达（无增量）。执行器输出然后恢复流式传输。

随后的 message_delta 事件带有更新的 usage.iterations 数组，反映顾问的令牌计数。

使用和计费

顾问调用作为单独的子推理运行，按顾问模型的费率计费。使用情况在 usage.iterations[] 数组中报告：

{
  "usage": {
    "input_tokens": 412,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0,
    "output_tokens": 531,
    "iterations": [
      {
        "type": "message",
        "input_tokens": 412,
        "cache_read_input_tokens": 0,
        "cache_creation_input_tokens": 0,
        "output_tokens": 89
      },
      {
        "type": "advisor_message",
        "model": "claude-opus-4-7",
        "input_tokens": 823,
        "cache_read_input_tokens": 0,
        "cache_creation_input_tokens": 0,
        "output_tokens": 1612
      },
      {
        "type": "message",
        "input_tokens": 1348,
        "cache_read_input_tokens": 412,
        "cache_creation_input_tokens": 0,
        "output_tokens": 442
      }
    ]
  }
}

顶级 max_tokens 仅适用于执行器输出。它不限制顾问子推理令牌。顾问的令牌也不从应用于执行器的任何任务预算中提取。

顾问提示缓存

有两个独立的缓存层。

执行器端缓存

顾问端缓存

在工具定义上设置 caching 以在同一对话中的调用之间为顾问自己的记录启用提示缓存：

tools = [
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-7",
        "caching": {"type": "ephemeral", "ttl": "5m"},
    }
]

保持一致： 设置 caching 一次并为整个对话保持它。在对话中途切换它关闭和打开会导致缓存未命中。

与其他工具结合

顾问工具与其他服务器端和客户端工具组合。将它们全部添加到同一 tools 数组：

tools = [
    {
        "type": "web_search_20250305",
        "name": "web_search",
        "max_uses": 5,
    },
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-7",
    },
    {
        "name": "run_bash",
        "description": "Run a bash command",
        "input_schema": {
            "type": "object",
            "properties": {"command": {"type": "string"}},
        },
    },
]

执行器可以在同一轮中搜索网络、调用顾问并使用您的自定义工具。顾问的计划可以告知执行器接下来要使用哪些工具。

功能	交互
批处理	支持。`usage.iterations` 按项目报告。
令牌计数	仅返回执行器的第一次迭代输入令牌。对于粗略的顾问估计，使用 `model` 设置为顾问模型和相同消息的 `count_tokens` 调用。
上下文编辑	`clear_tool_uses` 尚未与顾问工具块完全兼容；计划在后续版本中提供完全支持。使用 `clear_thinking` 时，请参阅上面的缓存警告。
`pause_turn`	悬挂的顾问调用以 `stop_reason: "pause_turn"` 和 `server_tool_use` 块作为最后一个内容块结束响应。顾问在恢复时执行。请参阅服务器工具。

最佳实践

编码和代理任务的提示

顾问工具附带内置描述，促使执行器在复杂任务开始时调用它，以及当它遇到困难时。对于研究任务，通常不需要额外的提示。

在编码和代理任务上，当顾问减少总工具调用和对话长度时，它以相似的成本产生更高的智能。两个时机驱动这种改进：

早期的第一个顾问调用，在记录中进行了几次探索性读取之后。
对于困难的任务，在文件写入和测试输出在记录中之后的最后顾问调用。

编码任务的建议系统提示

时机指导：

You have access to an `advisor` tool backed by a stronger reviewer model. It takes NO parameters — when you call advisor(), your entire conversation history is automatically forwarded. They see the task, every tool call you've made, every result you've seen.

Call advisor BEFORE substantive work — before writing, before committing to an interpretation, before building on an assumption. If the task requires orientation first (finding files, fetching a source, seeing what's there), do that, then call advisor. Orientation is not substantive work. Writing, editing, and declaring an answer are.

Also call advisor:
- When you believe the task is complete. BEFORE this call, make your deliverable durable: write the file, save the result, commit the change. The advisor call takes time; if the session ends during it, a durable result persists and an unwritten one doesn't.
- When stuck — errors recurring, approach not converging, results that don't fit.
- When considering a change of approach.

On tasks longer than a few steps, call advisor at least once before committing to an approach and once before declaring done. On short reactive tasks where the next action is dictated by tool output you just read, you don't need to keep calling — the advisor adds most of its value on the first call, before the approach crystallizes.

执行器应如何对待建议（直接放在时机块之后）：

Give the advice serious weight. If you follow a step and it fails empirically, or you have primary-source evidence that contradicts a specific claim (the file says X, the paper states Y), adapt. A passing self-test is not evidence the advice is wrong — it's evidence your test doesn't check what the advice is checking.

If you've already retrieved data pointing one way and the advisor points another: don't silently switch. Surface the conflict in one more advisor call — "I found X, you suggest Y, which constraint breaks the tie?" The advisor saw your evidence but may have underweighted it; a reconcile call is cheaper than committing to the wrong branch.

修剪顾问输出长度

The advisor should respond in under 100 words and use enumerated steps, not explanations.

将此与上面的时机块配对以获得最强的成本与质量权衡。

与努力设置配对

成本控制

对于对话级别的预算，在客户端计数顾问调用。当您达到上限时，从 tools 中删除顾问工具并从消息历史中删除所有 advisor_tool_result 块，以避免 400 invalid_request_error。
仅对您期望三个或更多顾问调用的对话启用 caching。

限制

顾问输出不流式传输。 在子推理运行时预期流中的暂停。
顾问调用没有内置的对话级别上限。 在客户端跟踪和限制它们。
max_tokens 仅适用于执行器输出。 它不限制顾问令牌。
Anthropic Priority Tier 按模型受尊重。执行器模型上的 Priority Tier 不扩展到顾问；您需要在顾问模型上特别拥有 Priority Tier。

Was this page helpful?