Was this page helpful?
当您向 Messages API 发送请求时,Claude 的响应包含一个 stop_reason 字段,该字段指示模型停止生成响应的原因。理解这些值对于构建能够适当处理不同响应类型的健壮应用至关重要。
有关 API 响应中 stop_reason 的详细信息,请参阅 Messages API 参考。
stop_reason 字段是每个成功的 Messages API 响应的一部分。与指示请求处理失败的错误不同,stop_reason 告诉您 Claude 为什么成功完成了其响应生成。
{
"id": "msg_01234",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Here's the answer to your question..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 100,
"output_tokens": 50
}
}最常见的停止原因。表示 Claude 自然完成了其响应。
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
# Process the complete response
print(response.content[0].text)有时 Claude 会返回一个空响应(恰好 2-3 个令牌,没有内容),其中 stop_reason: "end_turn"。这通常发生在 Claude 认为助手轮次已完成时,特别是在工具结果之后。
常见原因:
如何防止空响应:
# INCORRECT: Adding text immediately after tool_result
messages = [
{"role": "user", "content": "Calculate the sum of 1234 and 5678"},
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_123",
"name": "calculator",
"input": {"operation": "add", "a": 1234, "b": 5678},
}
],
},
{
"role": "user",
"content": [
{"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"},
{
"type": "text",
"text": "Here's the result", # Don't add text after tool_result
},
],
},
]
# CORRECT: Send tool results directly without additional text
messages = [
{"role": "user", "content": "Calculate the sum of 1234 and 5678"},
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_123",
"name": "calculator",
"input": {"operation": "add", "a": 1234, "b": 5678},
}
],
},
{
"role": "user",
"content": [
{"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"}
],
}, # Just the tool_result, no additional text
]
# If you still get empty responses after fixing the above:
def handle_empty_response(client, messages):
response = client.messages.create(
model="claude-opus-4-7", max_tokens=1024, messages=messages
)
# Check if response is empty
if response.stop_reason == "end_turn" and not response.content:
# INCORRECT: Don't just retry with the empty response
# This won't work because Claude already decided it's done
# CORRECT: Add a continuation prompt in a NEW user message
messages.append({"role": "user", "content": "Please continue"})
response = client.messages.create(
model="claude-opus-4-7", max_tokens=1024, messages=messages
)
return response最佳实践:
Claude 停止是因为它达到了您请求中指定的 max_tokens 限制。
# Request with limited tokens
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=10,
messages=[{"role": "user", "content": "Explain quantum physics"}],
)
if response.stop_reason == "max_tokens":
# Response was truncated
print("Response was cut off at token limit")
# Consider making another request to continue如果 Claude 的响应因为达到 max_tokens 限制而被截断,并且截断的响应包含一个不完整的工具使用块,您需要使用更高的 max_tokens 值重试请求以获得完整的工具使用。
Claude 遇到了您的一个自定义停止序列。
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
stop_sequences=["END", "STOP"],
messages=[{"role": "user", "content": "Generate text until you say END"}],
)
if response.stop_reason == "stop_sequence":
print(f"Stopped at sequence: {response.stop_sequence}")Claude 正在调用一个工具,并期望您执行它。
对于大多数工具使用实现,我们建议使用 tool runner,它会自动处理工具执行、结果格式化和对话管理。
from anthropic import Anthropic
client = Anthropic()
weather_tool = {
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state"},
},
"required": ["location"],
},
}
def execute_tool(name, tool_input):
"""Execute a tool and return the result."""
return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[weather_tool],
messages=[{"role": "user", "content": "What's the weather?"}],
)
if response.stop_reason == "tool_use":
# Extract and execute the tool
for content in response.content:
if content.type == "tool_use":
result = execute_tool(content.name, content.input)
# Return result to Claude for final response当执行 server tools(如网络搜索或网络获取)时,服务器端采样循环达到其迭代限制时返回。默认限制是每个请求 10 次迭代。
当这种情况发生时,响应可能包含一个 server_tool_use 块,但没有相应的 server_tool_result。要让 Claude 完成处理,请通过按原样发送响应来继续对话。
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=[{"type": "web_search_20250305", "name": "web_search"}],
messages=[{"role": "user", "content": "Search for latest AI news"}],
)
if response.stop_reason == "pause_turn":
# Continue the conversation by sending the response back
messages = [
{"role": "user", "content": original_query},
{"role": "assistant", "content": response.content},
]
continuation = client.messages.create(
model="claude-opus-4-7",
messages=messages,
tools=[{"type": "web_search_20250305", "name": "web_search"}],
)您的应用应该在任何使用 server tools 的代理循环中处理 pause_turn。只需将助手的响应添加到您的消息数组中,并发出另一个 API 请求以让 Claude 继续。
Claude 由于安全问题拒绝生成响应。
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "[Unsafe request]"}],
)
if response.stop_reason == "refusal":
# Claude declined to respond
print("Claude was unable to process this request")
# Consider rephrasing or modifying the request如果在使用 Claude Sonnet 4.5 或 Opus 4.1 时频繁遇到 refusal 停止原因,您可以尝试更新您的 API 调用以使用 Haiku 4.5(claude-haiku-4-5-20251001),它具有不同的使用限制。了解更多关于 理解 Sonnet 4.5 的 API 安全过滤器。
要了解更多关于 Claude Sonnet 4.5 的 API 安全过滤器触发的拒绝,请参阅 理解 Sonnet 4.5 的 API 安全过滤器。
Claude 停止是因为它达到了模型的上下文窗口限制。这允许您请求最大可能的令牌,而无需知道确切的输入大小。
# Request with maximum tokens to get as much as possible
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000, # Practical non-streaming ceiling (Opus 4.7 supports 128K with streaming)
messages=[
{"role": "user", "content": "Large input that uses most of context window..."}
],
)
if response.stop_reason == "model_context_window_exceeded":
# Response hit context window limit before max_tokens
print("Response reached model's context window limit")
# The response is still valid but was limited by context window此停止原因在 Sonnet 4.5 及更新的模型中默认可用。对于早期模型,使用 beta 标头 model-context-window-exceeded-2025-08-26 来启用此行为。
养成在响应处理逻辑中检查 stop_reason 的习惯:
def handle_response(response):
if response.stop_reason == "tool_use":
return handle_tool_use(response)
elif response.stop_reason == "max_tokens":
return handle_truncation(response)
elif response.stop_reason == "model_context_window_exceeded":
return handle_context_limit(response)
elif response.stop_reason == "pause_turn":
return handle_pause(response)
elif response.stop_reason == "refusal":
return handle_refusal(response)
else:
# Handle end_turn and other cases
return response.content[0].text当响应因令牌限制或上下文窗口而被截断时:
def handle_truncated_response(response):
if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
# Option 1: Warn the user about the specific limit
if response.stop_reason == "max_tokens":
message = "[Response truncated due to max_tokens limit]"
else:
message = "[Response truncated due to context window limit]"
return f"{response.content[0].text}\n\n{message}"
# Option 2: Continue generation
messages = [
{"role": "user", "content": original_prompt},
{"role": "assistant", "content": response.content[0].text},
]
continuation = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=messages + [{"role": "user", "content": "Please continue"}],
)
return response.content[0].text + continuation.content[0].text当使用 server tools 时,如果服务器端采样循环达到其迭代限制(默认 10),API 可能会返回 pause_turn。通过继续对话来处理这种情况:
def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
"""
Handle server tool conversations that may require multiple continuations.
The server runs a sampling loop when executing server tools. If the loop
reaches its iteration limit, the API returns pause_turn. Continue the
conversation by sending the response back to let Claude finish.
"""
messages = [{"role": "user", "content": user_query}]
for _ in range(max_continuations):
response = client.messages.create(
model="claude-opus-4-7", messages=messages, tools=tools
)
if response.stop_reason != "pause_turn":
# Claude finished processing - return the final response
return response
# pause_turn: replace the full message list to maintain alternating roles
messages = [
{"role": "user", "content": user_query},
{"role": "assistant", "content": response.content},
]
# Reached max continuations - return the last response
return response区分 stop_reason 值和实际错误很重要:
import anthropic
from anthropic import Anthropic
client = Anthropic()
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
# Handle successful response with stop_reason
if response.stop_reason == "max_tokens":
print("Response was truncated")
except anthropic.APIError as e:
# Handle actual errors
if e.status_code == 429:
print("Rate limit exceeded")
elif e.status_code == 500:
print("Server error")使用流式处理时,stop_reason 是:
message_start 事件中为 nullmessage_delta 事件中提供from anthropic import Anthropic
client = Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
) as stream:
for event in stream:
if event.type == "message_delta":
stop_reason = event.delta.stop_reason
if stop_reason:
print(f"Stream ended with: {stop_reason}")使用 tool runner 更简单:下面的示例显示手动工具处理。对于大多数用例,tool runner 会自动处理工具执行,代码少得多。
def complete_tool_workflow(client, user_query, tools):
messages = [{"role": "user", "content": user_query}]
while True:
response = client.messages.create(
model="claude-opus-4-7", messages=messages, tools=tools
)
if response.stop_reason == "tool_use":
# Execute tools and continue
tool_results = execute_tools(response.content)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
# Final response
return responsedef get_complete_response(client, prompt, max_attempts=3):
messages = [{"role": "user", "content": prompt}]
full_response = ""
for _ in range(max_attempts):
response = client.messages.create(
model="claude-opus-4-7", messages=messages, max_tokens=4096
)
full_response += response.content[0].text
if response.stop_reason != "max_tokens":
break
# Continue from where it left off
messages = [
{"role": "user", "content": prompt},
{"role": "assistant", "content": full_response},
{"role": "user", "content": "Please continue from where you left off."},
]
return full_response使用 model_context_window_exceeded 停止原因,您可以请求最大可能的令牌,而无需计算输入大小:
def get_max_possible_tokens(client, prompt):
"""
Get as many tokens as possible within the model's context window
without needing to calculate input token count
"""
response = client.messages.create(
model="claude-opus-4-7",
messages=[{"role": "user", "content": prompt}],
max_tokens=64000, # Practical non-streaming ceiling (Opus 4.7 supports 128K with streaming)
)
if response.stop_reason == "model_context_window_exceeded":
# Got the maximum possible tokens given input size
print(
f"Generated {response.usage.output_tokens} tokens (context limit reached)"
)
elif response.stop_reason == "max_tokens":
# Got exactly the requested tokens
print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
else:
# Natural completion
print(f"Generated {response.usage.output_tokens} tokens (natural completion)")
return response.content[0].text通过正确处理 stop_reason 值,您可以构建更健壮的应用,优雅地处理不同的响应场景并提供更好的用户体验。
# Check if response was truncated during tool use
if response.stop_reason == "max_tokens":
# Check if the last content block is an incomplete tool_use
last_block = response.content[-1]
if last_block.type == "tool_use":
# Send the request with higher max_tokens
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096, # Increased limit
messages=messages,
tools=tools,
)