Messages使用 Claude 建構

停止原因與備援

了解每個 stop_reason 值的含義，以及如何在您的應用程式中處理截斷、工具使用、暫停回合和拒絕。

每個 Messages API 回應都包含一個 stop_reason 欄位，告訴您 Claude 為什麼停止生成。檢查此欄位以決定是直接使用回應、繼續對話、重試，還是備援（fallback）到另一個模型。

如需完整的回應結構，請參閱 Messages API 參考文件。

快速參考

值	發生時機	應對方式
`end_turn`	Claude 自然地完成了回應。	使用該回應。
`max_tokens`	回應達到了您的 `max_tokens` 限制。	提高 `max_tokens` 或繼續該回應。
`stop_sequence`	Claude 輸出了您的其中一個 `stop_sequences`。	讀取 `stop_sequence` 以查看觸發了哪一個。
`tool_use`	Claude 正在呼叫工具。	執行該工具並回傳結果。仍缺少其結果區塊的伺服器工具呼叫會在後續回應中完成。
`pause_turn`	伺服器工具迴圈達到了其迭代限制。	將助手內容傳回以繼續。
`refusal`	Claude 拒絕回應。	讀取 `stop_details` 並在備援模型上重試。
`model_context_window_exceeded`	回應填滿了模型的上下文視窗。	將該回應視為已截斷。

stop_reason 欄位

stop_reason 欄位是每個成功的 Messages API 回應的一部分。與表示處理請求失敗的錯誤不同，stop_reason 告訴您 Claude 為什麼完成了其回應生成。

Example response

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "stop_details": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

停止原因值

end_turn

最常見的停止原因。表示 Claude 自然地完成了其回應。

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 處理完整的回應
    for block in response.content:
        if block.type == "text":
            print(block.text)

max_tokens

Claude 停止是因為達到了您請求中指定的 max_tokens 限制。

client = anthropic.Anthropic()
# 使用有限 token 數的請求
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # 回應已被截斷
    print("Response was cut off at token limit")
    # 可考慮再發出一次請求以繼續

stop_sequence

Claude 遇到了您的其中一個自訂停止序列。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claude 正在呼叫工具，並期望您執行它。

對於大多數工具使用的實作，請使用工具執行器，它會自動處理工具執行、結果格式化和對話管理。

client = anthropic.Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """Execute a tool and return the result."""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)

if response.stop_reason == "tool_use":
    # 擷取並執行工具
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # 將結果回傳給 Claude 以產生最終回應

tool_use 回應也可能包含一個 server_tool_use 區塊，其 id 沒有對應的結果區塊。該伺服器工具呼叫尚未完成，且此回應不包含其結果。在常見情況下，Claude 在同一組平行工具呼叫中呼叫了一個伺服器工具和您的其中一個客戶端工具：API 會在不執行伺服器工具的情況下回傳，以便您可以先執行客戶端工具。此狀態沒有其他標記；請透過檢查每個 server_tool_use 或 mcp_tool_use 區塊的 id 是否有對應的結果區塊來偵測它。

使用程式化工具呼叫時，相同的回應形狀代表不同的含義。客戶端 tool_use 區塊來自在 code_execution 工具中執行的程式碼，而不是直接來自 Claude，且其 caller 欄位指明了呼叫它的 code_execution 區塊。該程式碼已經開始執行：它正暫停等待您的 tool_result 區塊，傳送它們會恢復執行，而不是啟動一個延遲的工具。code_execution 區塊自身的結果區塊會在程式碼完成後到達，這可能需要不止一輪的工具結果。後續的使用者訊息本身在兩種情況下是相同的；使用程式化工具呼叫時，還需要傳回回應的 container 欄位中的 id，如該頁面所示。

A mixed tool_use response

{
  "stop_reason": "tool_use",
  "content": [
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01HxbWnMRmbWyMfUtJKC45rA",
      "name": "web_search",
      "input": { "query": "example article" }
    },
    {
      "type": "tool_use",
      "id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "name": "run_command",
      "input": { "command": "uname -a" }
    }
  ]
}

延續是一個由 tool_result 區塊組成的使用者訊息，回應中的每個 tool_use 區塊各對應一個（請參閱處理工具呼叫），並有兩個額外規則：該訊息除了 tool_result 區塊外不得包含任何其他內容，且請求必須保持相同的 tools 陣列。不再定義等待中的伺服器工具的恢復請求會失敗，並回傳 400 錯誤，其訊息以 but no `web_search` tool was provided 結尾。API 會將您的結果附加到仍然開啟的助手回合，執行延遲的伺服器工具（對於暫停的程式碼執行，則恢復它），並繼續該回合。對於 Claude 直接呼叫的伺服器工具，下一個回應的 content 會以回答前一個回應的 server_tool_use id 的結果區塊開頭。

The follow-up user message

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "content": "Linux demo-host 6.8.0-52-generic x86_64 GNU/Linux"
    }
  ]
}

在該使用者訊息的 tool_result 區塊之後加入任何內容（例如文字）會結束助手回合；對於 Claude 直接呼叫的伺服器工具，該請求隨後會失敗並回傳 400 invalid_request_error，其中指明了未解決的伺服器工具：

`web_search` tool use with id `srvtoolu_01HxbWnMRmbWyMfUtJKC45rA` was found without a corresponding `web_search_tool_result` block

遺漏 tool_result，或將其放在其他內容之後，會更早地失敗，並回傳標準的 tool_use ids were found without tool_result blocks immediately after 錯誤。若要給 Claude 更多輸入，請在回合完成後將其作為單獨的使用者訊息傳送。

pause_turn

當伺服器端取樣迴圈在執行伺服器工具（例如網頁搜尋）時達到其迭代限制時回傳。預設限制為每個請求 10 次迭代。

發生這種情況時，回應可能包含一個沒有對應結果區塊的 server_tool_use 區塊。若要讓 Claude 完成處理，請將回應原封不動地傳回以繼續對話。留下客戶端 tool_use 區塊等待您處理的回應，其 stop_reason 永遠不會是 pause_turn：當 Claude 停下來呼叫您的工具時，stop_reason 是 tool_use，您透過傳送客戶端 tool_result 區塊（而不是回應本身）來繼續它。

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=4096,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # 將回應傳回以繼續對話
    messages = [
        {"role": "user", "content": "Search for latest AI news"},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-5",
        max_tokens=4096,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )

您的應用程式應在任何使用伺服器工具的代理迴圈中處理 pause_turn。將助手的回應加入您的訊息陣列，並發出另一個 API 請求，讓 Claude 繼續。

refusal

Claude 拒絕生成回應。安全分類器會以正常的 HTTP 200 回應（而非錯誤）回傳此停止原因。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claude 拒絕回應
    print("Claude was unable to process this request")
    # 請考慮重新措辭或修改請求

如果您在使用 Claude Sonnet 4.5 或 Opus 4.1（已棄用；請參閱模型棄用）時經常遇到 refusal 停止原因，您可以嘗試更新您的 API 呼叫以使用 Haiku 4.5（claude-haiku-4-5-20251001），它有不同的使用限制。深入了解理解 Sonnet 4.5 的 API 安全過濾器。

在拒絕時，stop_details 物件會識別觸發它的政策類別。這些類別和完整的拒絕回應結構在拒絕與備援中有說明。對於 refusal 以外的所有停止原因，stop_details 為 null。

在 Claude Fable 5 或 Claude Opus 5 上被拒絕的請求通常可以透過在另一個 Claude 模型上重試來完成，拒絕與備援說明了如何在伺服器端或您的客戶端中設定該重試。備援額度說明了當您自行建構重試時，如何避免重複支付提示快取成本。

model_context_window_exceeded

Claude 停止是因為達到了模型的上下文視窗限制。這讓您可以在不知道確切輸入大小的情況下請求最大可能的 token 數。

此停止原因目前僅在 SDK 的 beta 命名空間中定義型別，因此以下範例呼叫 client.beta.messages 並使用帶有 Beta 前綴的型別。在 Sonnet 4.5 和更新的模型上，API 會在沒有 beta 標頭的情況下回傳此值。對於較早的模型，請加入 model-context-window-exceeded-2025-08-26 beta 標頭以啟用它。

# 以最大 token 數發出請求，以盡可能取得最多內容
response = client.beta.messages.create(
    model="claude-opus-5",
    max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # 回應在達到 max_tokens 之前就先達到上下文視窗限制
    print("Response reached model's context window limit")
    # 回應仍然有效，但受到上下文視窗的限制

處理停止原因的最佳實踐

始終檢查 stop_reason

養成在回應處理邏輯中檢查 stop_reason 的習慣：

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # 處理 end_turn 和其他情況
        return next(
            (block.text for block in response.content if block.type == "text"), ""
        )

優雅地處理被截斷的回應

當回應因 token 限制或上下文視窗而被截斷時，請附加一則通知，讓讀者知道輸出不完整。若要改為從回應中斷處繼續生成，請參閱確保完整的回應。

def handle_truncated_response(response):
    text = next((block.text for block in response.content if block.type == "text"), "")
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        if response.stop_reason == "max_tokens":
            note = "[Response truncated due to max_tokens limit]"
        else:
            note = "[Response truncated due to context window limit]"
        return f"{text}\n\n{note}"
    return text

為 pause_turn 實作重試邏輯

使用伺服器工具時，如果伺服器端取樣迴圈達到其迭代限制（預設為 10），API 可能會回傳 pause_turn。透過繼續對話來處理此情況：

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    Handle server tool conversations that may require multiple continuations.

    The server runs a sampling loop when executing server tools. If the loop
    reaches its iteration limit, the API returns pause_turn. Continue the
    conversation by sending the response back to let Claude finish.
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-5", max_tokens=4096, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claude 已完成處理 - 回傳最終回應
            return response

        # pause_turn：替換完整的訊息清單以維持角色交替
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 已達到最大接續次數 - 回傳最後一個回應
    return response

停止原因與錯誤的比較

區分 stop_reason 值和實際錯誤非常重要：

停止原因（成功的回應）

回應主體的一部分
表示生成為何正常停止
回應包含有效內容

錯誤（失敗的請求）

HTTP 狀態碼 4xx 或 5xx
表示請求處理失敗
回應包含錯誤詳細資訊

client = anthropic.Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # 處理帶有 stop_reason 的成功回應
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 處理實際錯誤
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

串流注意事項

使用串流（streaming）時，stop_reason 為：

在初始的 message_start 事件中為 null
在 message_delta 事件中提供
在任何其他事件中不提供

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

常見模式

處理工具使用工作流程

使用工具執行器更簡單： 以下範例展示了手動工具處理。對於大多數使用案例，工具執行器會以少得多的程式碼自動處理工具執行。

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-5", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # 執行工具並繼續
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 最終回應
            return response

確保完整的回應

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-5", messages=messages, max_tokens=4096
        )

        full_response += next(
            (block.text for block in response.content if block.type == "text"), ""
        )

        if response.stop_reason != "max_tokens":
            break

        # 從中斷處繼續
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

在不知道輸入大小的情況下取得最大 token 數

透過 model_context_window_exceeded 停止原因，您可以在不計算輸入大小的情況下請求最大可能的 token 數：

def get_max_possible_tokens(client, prompt):
    """
    Get as many tokens as possible within the model's context window
    without needing to calculate input token count
    """
    response = client.beta.messages.create(
        model="claude-opus-5",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 已取得在此輸入大小下可能的最大 token 數
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 已取得與要求數量完全相同的 token
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 自然完成
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return next((block.text for block in response.content if block.type == "text"), "")

後續步驟

拒絕與備援

在伺服器端或您的客戶端中，於備援模型上重試被拒絕的請求。

工具執行器（SDK）

讓 SDK 為您管理 tool_use 迴圈、結果格式化和重試。

串流訊息

串流時從 message_delta 事件讀取 stop_reason。

錯誤

處理 4xx 和 5xx HTTP 錯誤，它們與停止原因不同。

Was this page helpful?

Messages使用 Claude 建構

停止原因與備援

了解每個 stop_reason 值的含義，以及如何在您的應用程式中處理截斷、工具使用、暫停回合和拒絕。

如需完整的回應結構，請參閱 Messages API 參考文件。

快速參考

值	發生時機	應對方式
`end_turn`	Claude 自然地完成了回應。	使用該回應。
`max_tokens`	回應達到了您的 `max_tokens` 限制。	提高 `max_tokens` 或繼續該回應。
`stop_sequence`	Claude 輸出了您的其中一個 `stop_sequences`。	讀取 `stop_sequence` 以查看觸發了哪一個。
`tool_use`	Claude 正在呼叫工具。	執行該工具並回傳結果。仍缺少其結果區塊的伺服器工具呼叫會在後續回應中完成。
`pause_turn`	伺服器工具迴圈達到了其迭代限制。	將助手內容傳回以繼續。
`refusal`	Claude 拒絕回應。	讀取 `stop_details` 並在備援模型上重試。
`model_context_window_exceeded`	回應填滿了模型的上下文視窗。	將該回應視為已截斷。

stop_reason 欄位

stop_reason 欄位是每個成功的 Messages API 回應的一部分。與表示處理請求失敗的錯誤不同，stop_reason 告訴您 Claude 為什麼完成了其回應生成。

Example response

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "stop_details": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

停止原因值

end_turn

最常見的停止原因。表示 Claude 自然地完成了其回應。

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 處理完整的回應
    for block in response.content:
        if block.type == "text":
            print(block.text)

max_tokens

Claude 停止是因為達到了您請求中指定的 max_tokens 限制。

client = anthropic.Anthropic()
# 使用有限 token 數的請求
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # 回應已被截斷
    print("Response was cut off at token limit")
    # 可考慮再發出一次請求以繼續

stop_sequence

Claude 遇到了您的其中一個自訂停止序列。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claude 正在呼叫工具，並期望您執行它。

對於大多數工具使用的實作，請使用工具執行器，它會自動處理工具執行、結果格式化和對話管理。

client = anthropic.Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """Execute a tool and return the result."""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)

if response.stop_reason == "tool_use":
    # 擷取並執行工具
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # 將結果回傳給 Claude 以產生最終回應

A mixed tool_use response

{
  "stop_reason": "tool_use",
  "content": [
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01HxbWnMRmbWyMfUtJKC45rA",
      "name": "web_search",
      "input": { "query": "example article" }
    },
    {
      "type": "tool_use",
      "id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "name": "run_command",
      "input": { "command": "uname -a" }
    }
  ]
}

The follow-up user message

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "content": "Linux demo-host 6.8.0-52-generic x86_64 GNU/Linux"
    }
  ]
}

`web_search` tool use with id `srvtoolu_01HxbWnMRmbWyMfUtJKC45rA` was found without a corresponding `web_search_tool_result` block

pause_turn

當伺服器端取樣迴圈在執行伺服器工具（例如網頁搜尋）時達到其迭代限制時回傳。預設限制為每個請求 10 次迭代。

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=4096,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # 將回應傳回以繼續對話
    messages = [
        {"role": "user", "content": "Search for latest AI news"},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-5",
        max_tokens=4096,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )

您的應用程式應在任何使用伺服器工具的代理迴圈中處理 pause_turn。將助手的回應加入您的訊息陣列，並發出另一個 API 請求，讓 Claude 繼續。

refusal

Claude 拒絕生成回應。安全分類器會以正常的 HTTP 200 回應（而非錯誤）回傳此停止原因。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claude 拒絕回應
    print("Claude was unable to process this request")
    # 請考慮重新措辭或修改請求

model_context_window_exceeded

Claude 停止是因為達到了模型的上下文視窗限制。這讓您可以在不知道確切輸入大小的情況下請求最大可能的 token 數。

# 以最大 token 數發出請求，以盡可能取得最多內容
response = client.beta.messages.create(
    model="claude-opus-5",
    max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # 回應在達到 max_tokens 之前就先達到上下文視窗限制
    print("Response reached model's context window limit")
    # 回應仍然有效，但受到上下文視窗的限制

處理停止原因的最佳實踐

始終檢查 stop_reason

養成在回應處理邏輯中檢查 stop_reason 的習慣：

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # 處理 end_turn 和其他情況
        return next(
            (block.text for block in response.content if block.type == "text"), ""
        )

優雅地處理被截斷的回應

當回應因 token 限制或上下文視窗而被截斷時，請附加一則通知，讓讀者知道輸出不完整。若要改為從回應中斷處繼續生成，請參閱確保完整的回應。

def handle_truncated_response(response):
    text = next((block.text for block in response.content if block.type == "text"), "")
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        if response.stop_reason == "max_tokens":
            note = "[Response truncated due to max_tokens limit]"
        else:
            note = "[Response truncated due to context window limit]"
        return f"{text}\n\n{note}"
    return text

為 pause_turn 實作重試邏輯

使用伺服器工具時，如果伺服器端取樣迴圈達到其迭代限制（預設為 10），API 可能會回傳 pause_turn。透過繼續對話來處理此情況：

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    Handle server tool conversations that may require multiple continuations.

    The server runs a sampling loop when executing server tools. If the loop
    reaches its iteration limit, the API returns pause_turn. Continue the
    conversation by sending the response back to let Claude finish.
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-5", max_tokens=4096, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claude 已完成處理 - 回傳最終回應
            return response

        # pause_turn：替換完整的訊息清單以維持角色交替
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 已達到最大接續次數 - 回傳最後一個回應
    return response

停止原因與錯誤的比較

區分 stop_reason 值和實際錯誤非常重要：

停止原因（成功的回應）

回應主體的一部分
表示生成為何正常停止
回應包含有效內容

錯誤（失敗的請求）

HTTP 狀態碼 4xx 或 5xx
表示請求處理失敗
回應包含錯誤詳細資訊

client = anthropic.Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # 處理帶有 stop_reason 的成功回應
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 處理實際錯誤
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

串流注意事項

使用串流（streaming）時，stop_reason 為：

在初始的 message_start 事件中為 null
在 message_delta 事件中提供
在任何其他事件中不提供

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

常見模式

處理工具使用工作流程

使用工具執行器更簡單： 以下範例展示了手動工具處理。對於大多數使用案例，工具執行器會以少得多的程式碼自動處理工具執行。

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-5", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # 執行工具並繼續
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 最終回應
            return response

確保完整的回應

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-5", messages=messages, max_tokens=4096
        )

        full_response += next(
            (block.text for block in response.content if block.type == "text"), ""
        )

        if response.stop_reason != "max_tokens":
            break

        # 從中斷處繼續
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

在不知道輸入大小的情況下取得最大 token 數

透過 model_context_window_exceeded 停止原因，您可以在不計算輸入大小的情況下請求最大可能的 token 數：

def get_max_possible_tokens(client, prompt):
    """
    Get as many tokens as possible within the model's context window
    without needing to calculate input token count
    """
    response = client.beta.messages.create(
        model="claude-opus-5",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 已取得在此輸入大小下可能的最大 token 數
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 已取得與要求數量完全相同的 token
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 自然完成
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return next((block.text for block in response.content if block.type == "text"), "")

後續步驟

拒絕與備援

在伺服器端或您的客戶端中，於備援模型上重試被拒絕的請求。

工具執行器（SDK）

讓 SDK 為您管理 tool_use 迴圈、結果格式化和重試。

串流訊息

串流時從 message_delta 事件讀取 stop_reason。

錯誤

處理 4xx 和 5xx HTTP 錯誤，它們與停止原因不同。

Was this page helpful?

快速參考

stop_reason 欄位

停止原因值

end_turn

帶有 end_turn 的空回應

max_tokens

不完整的工具使用區塊

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

處理停止原因的最佳實踐

始終檢查 stop_reason

優雅地處理被截斷的回應

為 pause_turn 實作重試邏輯

停止原因與錯誤的比較

停止原因（成功的回應）

錯誤（失敗的請求）

串流注意事項

常見模式

處理工具使用工作流程

確保完整的回應

在不知道輸入大小的情況下取得最大 token 數

後續步驟

快速參考

stop_reason 欄位

停止原因值

end_turn

帶有 end_turn 的空回應

max_tokens

不完整的工具使用區塊

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

處理停止原因的最佳實踐

始終檢查 stop_reason

優雅地處理被截斷的回應

為 pause_turn 實作重試邏輯

停止原因與錯誤的比較

停止原因（成功的回應）

錯誤（失敗的請求）

串流注意事項

常見模式

處理工具使用工作流程

確保完整的回應

在不知道輸入大小的情況下取得最大 token 數

後續步驟

快速參考

stop_reason 欄位

停止原因值

end_turn

max_tokens

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

處理停止原因的最佳實踐

始終檢查 stop_reason

優雅地處理被截斷的回應

為 pause_turn 實作重試邏輯

停止原因與錯誤的比較

停止原因（成功的回應）

錯誤（失敗的請求）

串流注意事項

常見模式

處理工具使用工作流程

確保完整的回應

在不知道輸入大小的情況下取得最大 token 數

後續步驟

快速參考

stop_reason 欄位

停止原因值

end_turn

max_tokens

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

處理停止原因的最佳實踐

始終檢查 stop_reason

優雅地處理被截斷的回應

為 pause_turn 實作重試邏輯

停止原因與錯誤的比較

停止原因（成功的回應）

錯誤（失敗的請求）

串流注意事項

常見模式

處理工具使用工作流程

確保完整的回應

在不知道輸入大小的情況下取得最大 token 數

後續步驟