Was this page helpful?
每個 Messages API 回應都包含一個 stop_reason 欄位,告訴您 Claude 停止生成的原因。檢查此欄位以決定是否直接使用該回應、繼續對話、重試,或是備援至另一個模型。
如需完整的回應結構描述,請參閱 Messages API 參考文件。
| 值 | 發生時機 | 處理方式 |
|---|---|---|
end_turn | Claude 自然地完成了回應。 | 使用該回應。 |
max_tokens | 回應達到您的 max_tokens 限制。 | 提高 max_tokens 或繼續生成回應。 |
stop_sequence | Claude 輸出了您的其中一個 stop_sequences。 | 讀取 stop_sequence 以查看觸發的是哪一個。 |
tool_use | Claude 正在呼叫工具。 | 執行該工具並回傳結果。 |
pause_turn | 伺服器工具迴圈達到其迭代限制。 | 將助理內容傳回以繼續。 |
refusal | Claude 拒絕回應。 | 讀取 stop_details 並在備援模型上重試。 |
model_context_window_exceeded | 回應填滿了模型的上下文視窗。 | 將該回應視為已截斷。 |
stop_reason 欄位是每個成功的 Messages API 回應的一部分。與表示處理請求失敗的錯誤不同,stop_reason 告訴您 Claude 完成回應生成的原因。
{
"id": "msg_01234",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Here's the answer to your question..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"stop_details": null,
"usage": {
"input_tokens": 100,
"output_tokens": 50
}
}最常見的停止原因。表示 Claude 自然地完成了回應。
Claude 因達到您在請求中指定的 max_tokens 限制而停止。
Claude 遇到了您的其中一個自訂停止序列。
Claude 正在呼叫工具,並預期您執行它。
對於大多數工具使用的實作,請使用 tool runner(工具執行器),它會自動處理工具執行、結果格式化和對話管理。
當伺服器端取樣迴圈在執行伺服器工具(如網頁搜尋或網頁擷取)時達到其迭代限制時回傳。預設限制為每個請求 10 次迭代。
發生這種情況時,回應可能包含沒有對應 server_tool_result 的 server_tool_use 區塊。若要讓 Claude 完成處理,請將回應原樣傳回以繼續對話。
您的應用程式應在任何使用伺服器工具的代理迴圈中處理 pause_turn。將助理的回應新增至您的訊息陣列,並發出另一個 API 請求以讓 Claude 繼續。
Claude 拒絕生成回應。在 Claude Fable 5 上,安全分類器會以正常的 HTTP 200 回應回傳此停止原因,而非錯誤。
如果您在使用 Claude Sonnet 4.5 或 Opus 4.1(已棄用)時經常遇到 refusal 停止原因,您可以嘗試將 API 呼叫更新為使用 Haiku 4.5(claude-haiku-4-5-20251001),它具有不同的使用限制。進一步了解理解 Sonnet 4.5 的 API 安全過濾器。
發生拒絕時,stop_details 物件會識別觸發該拒絕的政策類別。這些類別和完整的拒絕回應結構在拒絕與備援中有說明。對於 refusal 以外的所有停止原因,stop_details 為 null。
在 Claude Fable 5 上被拒絕的請求通常可以透過在另一個 Claude 模型上重試來處理,拒絕與備援說明了如何在伺服器端或您的用戶端設定該重試。備援額度說明了當您自行建置重試時,如何避免支付兩次提示快取成本。
Claude 因達到模型的上下文視窗限制而停止。這讓您可以在不知道確切輸入大小的情況下請求最大可能的 token 數。
此停止原因目前僅在 SDK 的 beta 命名空間中有型別定義,因此以下範例呼叫 client.beta.messages 並使用帶有 Beta 前綴的型別。在 Sonnet 4.5 及更新的模型上,API 會在沒有 beta 標頭的情況下回傳此值。對於較早的模型,請新增 model-context-window-exceeded-2025-08-26 beta 標頭以啟用它。
養成在回應處理邏輯中檢查 stop_reason 的習慣:
當回應因 token 限制或上下文視窗而被截斷時,附加一則通知,讓讀者知道輸出不完整。若要改為從回應中斷處繼續生成,請參閱確保完整回應。
使用伺服器工具時,如果伺服器端取樣迴圈達到其迭代限制(預設為 10),API 可能會回傳 pause_turn。透過繼續對話來處理此情況:
區分 stop_reason 值與實際錯誤非常重要:
使用串流時,stop_reason 為:
message_start 事件中為 nullmessage_delta 事件中提供**使用 tool runner 更簡單:**以下範例展示手動工具處理。對於大多數使用案例,tool runner 會以更少的程式碼自動處理工具執行。
透過 model_context_window_exceeded 停止原因,您可以在不計算輸入大小的情況下請求最大可能的 token 數:
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
# 處理完整的回應
print(response.content[0].text)client = anthropic.Anthropic()
# 使用有限 token 數的請求
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=10,
messages=[{"role": "user", "content": "Explain quantum physics"}],
)
if response.stop_reason == "max_tokens":
# 回應已被截斷
print("Response was cut off at token limit")
# 考慮發出另一個請求以繼續client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
stop_sequences=["END", "STOP"],
messages=[{"role": "user", "content": "Generate text until you say END"}],
)
if response.stop_reason == "stop_sequence":
print(f"Stopped at sequence: {response.stop_sequence}")client = anthropic.Anthropic()
weather_tool = {
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state"},
},
"required": ["location"],
},
}
def execute_tool(name, tool_input):
"""Execute a tool and return the result."""
return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
tools=[weather_tool],
messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)
if response.stop_reason == "tool_use":
# 擷取並執行工具
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
# 將結果回傳給 Claude 以取得最終回應response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
tools=[{"type": "web_search_20250305", "name": "web_search"}],
messages=[{"role": "user", "content": "Search for latest AI news"}],
)
if response.stop_reason == "pause_turn":
# 透過將回應傳回以繼續對話
messages = [
{"role": "user", "content": "Search for latest AI news"},
{"role": "assistant", "content": response.content},
]
continuation = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=messages,
tools=[{"type": "web_search_20250305", "name": "web_search"}],
)client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "[Unsafe request]"}],
)
if response.stop_reason == "refusal":
# Claude 拒絕回應
print("Claude was unable to process this request")
# 請考慮重新措辭或修改請求# 以最大 token 數發出請求,盡可能取得更多內容
response = client.beta.messages.create(
model="claude-opus-4-8",
max_tokens=20000, # Python SDK requires streaming for max_tokens above ~21k (Opus 4.8 supports 128k with streaming)
messages=[
{"role": "user", "content": "Large input that uses most of context window..."}
],
)
if response.stop_reason == "model_context_window_exceeded":
# 回應在達到 max_tokens 之前已觸及上下文視窗限制
print("Response reached model's context window limit")
# 回應仍然有效,但受到上下文視窗的限制def handle_response(response):
if response.stop_reason == "tool_use":
return handle_tool_use(response)
elif response.stop_reason == "max_tokens":
return handle_truncation(response)
elif response.stop_reason == "model_context_window_exceeded":
return handle_context_limit(response)
elif response.stop_reason == "pause_turn":
return handle_pause(response)
elif response.stop_reason == "refusal":
return handle_refusal(response)
else:
# 處理 end_turn 和其他情況
return response.content[0].textdef handle_truncated_response(response):
if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
if response.stop_reason == "max_tokens":
note = "[Response truncated due to max_tokens limit]"
else:
note = "[Response truncated due to context window limit]"
return f"{response.content[0].text}\n\n{note}"
return response.content[0].textdef handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
"""
Handle server tool conversations that may require multiple continuations.
The server runs a sampling loop when executing server tools. If the loop
reaches its iteration limit, the API returns pause_turn. Continue the
conversation by sending the response back to let Claude finish.
"""
messages = [{"role": "user", "content": user_query}]
for _ in range(max_continuations):
response = client.messages.create(
model="claude-opus-4-8", max_tokens=4096, messages=messages, tools=tools
)
if response.stop_reason != "pause_turn":
# Claude 已完成處理 - 回傳最終回應
return response
# pause_turn:替換整個訊息列表以維持角色交替
messages = [
{"role": "user", "content": user_query},
{"role": "assistant", "content": response.content},
]
# 已達到最大續接次數 - 回傳最後的回應
return responseclient = anthropic.Anthropic()
try:
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
# 處理帶有 stop_reason 的成功回應
if response.stop_reason == "max_tokens":
print("Response was truncated")
except anthropic.APIStatusError as e:
# 處理實際錯誤
if e.status_code == 429:
print("Rate limit exceeded")
elif e.status_code == 500:
print("Server error")client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
) as stream:
for event in stream:
if event.type == "message_delta":
stop_reason = event.delta.stop_reason
if stop_reason:
print(f"Stream ended with: {stop_reason}")def complete_tool_workflow(client, user_query, tools):
messages = [{"role": "user", "content": user_query}]
while True:
response = client.messages.create(
model="claude-opus-4-8", max_tokens=1024, messages=messages, tools=tools
)
if response.stop_reason == "tool_use":
# 執行工具並繼續
tool_results = execute_tools(response.content)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
# 最終回應
return responsedef get_complete_response(client, prompt, max_attempts=3):
messages = [{"role": "user", "content": prompt}]
full_response = ""
for _ in range(max_attempts):
response = client.messages.create(
model="claude-opus-4-8", messages=messages, max_tokens=4096
)
full_response += response.content[0].text
if response.stop_reason != "max_tokens":
break
# 從中斷處繼續
messages = [
{"role": "user", "content": prompt},
{"role": "assistant", "content": full_response},
{"role": "user", "content": "Please continue from where you left off."},
]
return full_responsedef get_max_possible_tokens(client, prompt):
"""
Get as many tokens as possible within the model's context window
without needing to calculate input token count
"""
response = client.beta.messages.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": prompt}],
max_tokens=20000, # Python SDK requires streaming for max_tokens above ~21k
)
if response.stop_reason == "model_context_window_exceeded":
# 在給定輸入大小下取得了最大可能的 token 數
print(
f"Generated {response.usage.output_tokens} tokens (context limit reached)"
)
elif response.stop_reason == "max_tokens":
# 取得了恰好符合請求的 token 數
print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
else:
# 自然完成
print(f"Generated {response.usage.output_tokens} tokens (natural completion)")
return response.content[0].text