每个 Messages API 响应都包含一个 stop_reason 字段,用于告知您 Claude 停止生成的原因。检查此字段以决定是直接使用响应、继续对话、重试,还是回退到另一个模型。
有关完整的响应架构,请参阅 Messages API 参考。
| 值 | 何时出现 | 应采取的操作 |
|---|---|---|
end_turn | Claude 自然地完成了其响应。 | 使用该响应。 |
max_tokens | 响应达到了您的 max_tokens 限制。 | 提高 max_tokens 或继续生成响应。 |
stop_sequence | Claude 输出了您的某个 stop_sequences。 | 读取 stop_sequence 以查看触发了哪一个。 |
tool_use | Claude 正在调用工具。 | 运行该工具并返回结果。 |
pause_turn | 服务器工具循环达到了其迭代限制。 | 将助手内容发回以继续。 |
refusal | Claude 拒绝响应。 | 读取 stop_details 并在回退模型上重试。 |
model_context_window_exceeded | 响应填满了模型的上下文窗口。 | 将响应视为已截断。 |
stop_reason 字段是每个成功的 Messages API 响应的一部分。与表示请求处理失败的错误不同,stop_reason 告诉您 Claude 完成响应生成的原因。
{
"id": "msg_01234",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Here's the answer to your question..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"stop_details": null,
"usage": {
"input_tokens": 100,
"output_tokens": 50
}
}最常见的停止原因。表示 Claude 自然地完成了其响应。
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
# 处理完整的响应
print(response.content[0].text)Claude 因达到您在请求中指定的 max_tokens 限制而停止。
client = anthropic.Anthropic()
# 使用有限令牌数的请求
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=10,
messages=[{"role": "user", "content": "Explain quantum physics"}],
)
if response.stop_reason == "max_tokens":
# 响应被截断
print("Response was cut off at token limit")
# 考虑发起另一个请求以继续Claude 遇到了您的某个自定义停止序列。
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
stop_sequences=["END", "STOP"],
messages=[{"role": "user", "content": "Generate text until you say END"}],
)
if response.stop_reason == "stop_sequence":
print(f"Stopped at sequence: {response.stop_sequence}")Claude 正在调用工具并期望您执行它。
对于大多数工具使用实现,请使用 tool runner(工具运行器),它会自动处理工具执行、结果格式化和对话管理。
client = anthropic.Anthropic()
weather_tool = {
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state"},
},
"required": ["location"],
},
}
def execute_tool(name, tool_input):
"""Execute a tool and return the result."""
return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
tools=[weather_tool],
messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)
if response.stop_reason == "tool_use":
# 提取并执行工具
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
# 将结果返回给 Claude 以获取最终响应当服务器端采样循环在执行服务器工具(如网络搜索或网页抓取)时达到其迭代限制时返回。默认限制为每个请求 10 次迭代。
发生这种情况时,响应可能包含一个没有对应 server_tool_result 的 server_tool_use 块。要让 Claude 完成处理,请将响应原样发回以继续对话。
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
tools=[{"type": "web_search_20250305", "name": "web_search"}],
messages=[{"role": "user", "content": "Search for latest AI news"}],
)
if response.stop_reason == "pause_turn":
# 通过将响应发送回去来继续对话
messages = [
{"role": "user", "content": "Search for latest AI news"},
{"role": "assistant", "content": response.content},
]
continuation = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=messages,
tools=[{"type": "web_search_20250305", "name": "web_search"}],
)您的应用程序应在任何使用服务器工具的智能体循环中处理 pause_turn。将助手的响应添加到您的消息数组中,并发起另一个 API 请求以让 Claude 继续。
Claude 拒绝生成响应。在 Claude Fable 5 上,安全分类器将此停止原因作为正常的 HTTP 200 响应返回,而不是错误。
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "[Unsafe request]"}],
)
if response.stop_reason == "refusal":
# Claude 拒绝回应
print("Claude was unable to process this request")
# 请考虑重新表述或修改请求如果您在使用 Claude Sonnet 4.5 或 Opus 4.1(已弃用)时频繁遇到 refusal 停止原因,可以尝试将您的 API 调用更新为使用 Haiku 4.5(claude-haiku-4-5-20251001),它具有不同的使用限制。了解更多关于理解 Sonnet 4.5 的 API 安全过滤器的信息。
在拒绝时,stop_details 对象会标识触发拒绝的策略类别。这些类别和完整的拒绝响应结构在拒绝与回退中有详细介绍。对于 refusal 以外的所有停止原因,stop_details 为 null。
在 Claude Fable 5 上被拒绝的请求通常可以通过在另一个 Claude 模型上重试来处理,拒绝与回退展示了如何在服务器端或客户端设置该重试。回退额度介绍了当您自行构建重试时如何避免重复支付提示缓存成本。
Claude 因达到模型的上下文窗口限制而停止。这使您可以在不知道确切输入大小的情况下请求尽可能多的令牌。
此停止原因目前仅在 SDK 的 beta 命名空间中定义了类型,因此以下示例调用 client.beta.messages 并使用带 Beta 前缀的类型。在 Sonnet 4.5 及更新的模型上,API 无需 beta 标头即可返回此值。对于更早的模型,请添加 model-context-window-exceeded-2025-08-26 beta 标头以启用它。
# 请求时设置最大令牌数以获取尽可能多的内容
response = client.beta.messages.create(
model="claude-opus-4-8",
max_tokens=20000, # Python SDK requires streaming for max_tokens above ~21k (Opus 4.8 supports 128k with streaming)
messages=[
{"role": "user", "content": "Large input that uses most of context window..."}
],
)
if response.stop_reason == "model_context_window_exceeded":
# 响应在达到 max_tokens 之前触及了上下文窗口限制
print("Response reached model's context window limit")
# 响应仍然有效,但受到了上下文窗口的限制养成在响应处理逻辑中检查 stop_reason 的习惯:
def handle_response(response):
if response.stop_reason == "tool_use":
return handle_tool_use(response)
elif response.stop_reason == "max_tokens":
return handle_truncation(response)
elif response.stop_reason == "model_context_window_exceeded":
return handle_context_limit(response)
elif response.stop_reason == "pause_turn":
return handle_pause(response)
elif response.stop_reason == "refusal":
return handle_refusal(response)
else:
# 处理 end_turn 及其他情况
return response.content[0].text当响应因令牌限制或上下文窗口而被截断时,附加一条提示,以便读者知道输出不完整。如果要从响应中断处继续生成,请参阅确保响应完整。
def handle_truncated_response(response):
if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
if response.stop_reason == "max_tokens":
note = "[Response truncated due to max_tokens limit]"
else:
note = "[Response truncated due to context window limit]"
return f"{response.content[0].text}\n\n{note}"
return response.content[0].text使用服务器工具时,如果服务器端采样循环达到其迭代限制(默认为 10),API 可能会返回 pause_turn。通过继续对话来处理这种情况:
def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
"""
Handle server tool conversations that may require multiple continuations.
The server runs a sampling loop when executing server tools. If the loop
reaches its iteration limit, the API returns pause_turn. Continue the
conversation by sending the response back to let Claude finish.
"""
messages = [{"role": "user", "content": user_query}]
for _ in range(max_continuations):
response = client.messages.create(
model="claude-opus-4-8", max_tokens=4096, messages=messages, tools=tools
)
if response.stop_reason != "pause_turn":
# Claude 已完成处理——返回最终响应
return response
# pause_turn:替换整个消息列表以保持角色交替
messages = [
{"role": "user", "content": user_query},
{"role": "assistant", "content": response.content},
]
# 已达到最大续接次数——返回最后一个响应
return response区分 stop_reason 值和实际错误非常重要:
client = anthropic.Anthropic()
try:
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
# 处理带有 stop_reason 的成功响应
if response.stop_reason == "max_tokens":
print("Response was truncated")
except anthropic.APIStatusError as e:
# 处理实际错误
if e.status_code == 429:
print("Rate limit exceeded")
elif e.status_code == 500:
print("Server error")使用流式传输时,stop_reason:
message_start 事件中为 nullmessage_delta 事件中提供client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
) as stream:
for event in stream:
if event.type == "message_delta":
stop_reason = event.delta.stop_reason
if stop_reason:
print(f"Stream ended with: {stop_reason}")使用 tool runner 更简单: 以下示例展示了手动工具处理。对于大多数用例,tool runner(工具运行器)可以用更少的代码自动处理工具执行。
def complete_tool_workflow(client, user_query, tools):
messages = [{"role": "user", "content": user_query}]
while True:
response = client.messages.create(
model="claude-opus-4-8", max_tokens=1024, messages=messages, tools=tools
)
if response.stop_reason == "tool_use":
# 执行工具并继续
tool_results = execute_tools(response.content)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
# 最终响应
return responsedef get_complete_response(client, prompt, max_attempts=3):
messages = [{"role": "user", "content": prompt}]
full_response = ""
for _ in range(max_attempts):
response = client.messages.create(
model="claude-opus-4-8", messages=messages, max_tokens=4096
)
full_response += response.content[0].text
if response.stop_reason != "max_tokens":
break
# 从中断处继续
messages = [
{"role": "user", "content": prompt},
{"role": "assistant", "content": full_response},
{"role": "user", "content": "Please continue from where you left off."},
]
return full_response借助 model_context_window_exceeded 停止原因,您可以在不计算输入大小的情况下请求尽可能多的令牌:
def get_max_possible_tokens(client, prompt):
"""
Get as many tokens as possible within the model's context window
without needing to calculate input token count
"""
response = client.beta.messages.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": prompt}],
max_tokens=20000, # Python SDK requires streaming for max_tokens above ~21k
)
if response.stop_reason == "model_context_window_exceeded":
# 在给定输入大小下获得了最大可能的令牌数
print(
f"Generated {response.usage.output_tokens} tokens (context limit reached)"
)
elif response.stop_reason == "max_tokens":
# 恰好获得了请求的令牌数
print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
else:
# 自然完成
print(f"Generated {response.usage.output_tokens} tokens (natural completion)")
return response.content[0].text在服务器端或客户端的回退模型上重试被拒绝的请求。
让 SDK 为您管理 tool_use 循环、结果格式化和重试。
在流式传输时从 message_delta 事件中读取 stop_reason。
处理 4xx 和 5xx HTTP 错误,它们与停止原因不同。
Was this page helpful?