Loading...
    • 建構
    • 管理
    • 模型與定價
    • 客戶端 SDK
    • API 參考
    Search...
    ⌘K
    Log in
    處理停止原因
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    建構/使用 Claude 建構

    處理停止原因

    了解 Claude API 回應中的 stop_reason 欄位,以及如何根據不同的停止原因適當地處理回應。

    Was this page helpful?

    • stop_reason 欄位
    • end_turn
    • max_tokens
    • stop_sequence
    • tool_use
    • pause_turn
    • refusal
    • model_context_window_exceeded
    • 1. 始終檢查 stop_reason
    • 2. 優雅地處理截斷的回應
    • 3. 為 pause_turn 實現重試邏輯

    當您向 Messages API 發出請求時,Claude 的回應包含一個 stop_reason 欄位,該欄位指示模型停止生成回應的原因。理解這些值對於構建能夠適當處理不同回應類型的強大應用程式至關重要。

    有關 API 回應中 stop_reason 的詳細信息,請參閱 Messages API 參考。

    stop_reason 欄位

    stop_reason 欄位是每個成功的 Messages API 回應的一部分。與指示請求處理失敗的錯誤不同,stop_reason 告訴您 Claude 為什麼成功完成了其回應生成。

    Example response
    {
      "id": "msg_01234",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "Here's the answer to your question..."
        }
      ],
      "stop_reason": "end_turn",
      "stop_sequence": null,
      "usage": {
        "input_tokens": 100,
        "output_tokens": 50
      }
    }

    停止原因值

    end_turn

    最常見的停止原因。表示 Claude 自然地完成了其回應。

    Python
    from anthropic import Anthropic
    
    client = Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    if response.stop_reason == "end_turn":
        # Process the complete response
        print(response.content[0].text)

    帶有 end_turn 的空回應

    有時 Claude 會返回一個空回應(恰好 2-3 個令牌,沒有內容),其中 stop_reason: "end_turn"。這通常發生在 Claude 認為助手輪次已完成時,特別是在工具結果之後。

    常見原因:

    • 在工具結果之後立即添加文本塊(Claude 學會期望用戶在工具結果後始終插入文本,因此它結束其輪次以遵循該模式)
    • 發送 Claude 的已完成回應而不添加任何內容(Claude 已經決定完成,因此它將保持完成)

    如何防止空回應:

    # INCORRECT: Adding text immediately after tool_result
    messages = [
        {"role": "user", "content": "Calculate the sum of 1234 and 5678"},
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "toolu_123",
                    "name": "calculator",
                    "input": {"operation": "add", "a": 1234, "b": 5678},
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"},
                {
                    "type": "text",
                    "text": "Here's the result",  # Don't add text after tool_result
                },
            ],
        },
    ]
    
    # CORRECT: Send tool results directly without additional text
    messages = [
        {"role": "user", "content": "Calculate the sum of 1234 and 5678"},
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "toolu_123",
                    "name": "calculator",
                    "input": {"operation": "add", "a": 1234, "b": 5678},
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"}
            ],
        },  # Just the tool_result, no additional text
    ]
    
    
    # If you still get empty responses after fixing the above:
    def handle_empty_response(client, messages):
        response = client.messages.create(
            model="claude-opus-4-7", max_tokens=1024, messages=messages
        )
    
        # Check if response is empty
        if response.stop_reason == "end_turn" and not response.content:
            # INCORRECT: Don't just retry with the empty response
            # This won't work because Claude already decided it's done
    
            # CORRECT: Add a continuation prompt in a NEW user message
            messages.append({"role": "user", "content": "Please continue"})
    
            response = client.messages.create(
                model="claude-opus-4-7", max_tokens=1024, messages=messages
            )
    
        return response

    最佳實踐:

    1. 永遠不要在工具結果之後立即添加文本塊 - 這會教導 Claude 期望在每次工具使用後進行用戶輸入
    2. 不要在沒有修改的情況下重試空回應 - 簡單地發送空回應回去不會有幫助
    3. 使用延續提示作為最後手段 - 僅當上述修復無法解決問題時

    max_tokens

    Claude 停止是因為達到了您請求中指定的 max_tokens 限制。

    Python
    # Request with limited tokens
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=10,
        messages=[{"role": "user", "content": "Explain quantum physics"}],
    )
    
    if response.stop_reason == "max_tokens":
        # Response was truncated
        print("Response was cut off at token limit")
        # Consider making another request to continue

    不完整的工具使用塊

    如果 Claude 的回應因達到 max_tokens 限制而被截斷,並且截斷的回應包含不完整的工具使用塊,您需要使用更高的 max_tokens 值重試請求以獲得完整的工具使用。

    stop_sequence

    Claude 遇到了您的自訂停止序列之一。

    Python
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        stop_sequences=["END", "STOP"],
        messages=[{"role": "user", "content": "Generate text until you say END"}],
    )
    
    if response.stop_reason == "stop_sequence":
        print(f"Stopped at sequence: {response.stop_sequence}")

    tool_use

    Claude 正在調用工具並期望您執行它。

    對於大多數工具使用實現,我們建議使用 tool runner,它會自動處理工具執行、結果格式化和對話管理。

    Python
    from anthropic import Anthropic
    
    client = Anthropic()
    weather_tool = {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state"},
            },
            "required": ["location"],
        },
    }
    
    
    def execute_tool(name, tool_input):
        """Execute a tool and return the result."""
        return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"
    
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=[weather_tool],
        messages=[{"role": "user", "content": "What's the weather?"}],
    )
    
    if response.stop_reason == "tool_use":
        # Extract and execute the tool
        for content in response.content:
            if content.type == "tool_use":
                result = execute_tool(content.name, content.input)
                # Return result to Claude for final response

    pause_turn

    當執行 server tools(如網路搜尋或網路擷取)時,伺服器端採樣迴圈達到其迭代限制時返回。預設限制是每個請求 10 次迭代。

    當發生這種情況時,回應可能包含 server_tool_use 塊,但沒有對應的 server_tool_result。為了讓 Claude 完成處理,請通過按原樣發送回應來繼續對話。

    Python
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
        messages=[{"role": "user", "content": "Search for latest AI news"}],
    )
    
    if response.stop_reason == "pause_turn":
        # Continue the conversation by sending the response back
        messages = [
            {"role": "user", "content": original_query},
            {"role": "assistant", "content": response.content},
        ]
        continuation = client.messages.create(
            model="claude-opus-4-7",
            messages=messages,
            tools=[{"type": "web_search_20250305", "name": "web_search"}],
        )

    您的應用程式應在任何使用伺服器工具的代理迴圈中處理 pause_turn。只需將助手的回應添加到您的消息陣列中,並發出另一個 API 請求以讓 Claude 繼續。

    refusal

    Claude 因安全考慮而拒絕生成回應。

    Python
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=[{"role": "user", "content": "[Unsafe request]"}],
    )
    
    if response.stop_reason == "refusal":
        # Claude declined to respond
        print("Claude was unable to process this request")
        # Consider rephrasing or modifying the request

    如果在使用 Claude Sonnet 4.5 或 Opus 4.1 時頻繁遇到 refusal 停止原因,您可以嘗試更新您的 API 呼叫以使用 Haiku 4.5(claude-haiku-4-5-20251001),它具有不同的使用限制。了解更多關於 理解 Sonnet 4.5 的 API 安全篩選器。

    要了解有關 Claude Sonnet 4.5 的 API 安全篩選器觸發的拒絕的更多信息,請參閱 理解 Sonnet 4.5 的 API 安全篩選器。

    model_context_window_exceeded

    Claude 停止是因為達到了模型的上下文視窗限制。這允許您請求最大可能的令牌,而無需知道確切的輸入大小。

    Python
    # Request with maximum tokens to get as much as possible
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=64000,  # Practical non-streaming ceiling (Opus 4.7 supports 128K with streaming)
        messages=[
            {"role": "user", "content": "Large input that uses most of context window..."}
        ],
    )
    
    if response.stop_reason == "model_context_window_exceeded":
        # Response hit context window limit before max_tokens
        print("Response reached model's context window limit")
        # The response is still valid but was limited by context window

    此停止原因在 Sonnet 4.5 和更新的模型中預設可用。對於較早的模型,使用測試版標頭 model-context-window-exceeded-2025-08-26 來啟用此行為。

    處理停止原因的最佳實踐

    1. 始終檢查 stop_reason

    養成在您的回應處理邏輯中檢查 stop_reason 的習慣:

    def handle_response(response):
        if response.stop_reason == "tool_use":
            return handle_tool_use(response)
        elif response.stop_reason == "max_tokens":
            return handle_truncation(response)
        elif response.stop_reason == "model_context_window_exceeded":
            return handle_context_limit(response)
        elif response.stop_reason == "pause_turn":
            return handle_pause(response)
        elif response.stop_reason == "refusal":
            return handle_refusal(response)
        else:
            # Handle end_turn and other cases
            return response.content[0].text

    2. 優雅地處理截斷的回應

    當回應因令牌限制或上下文視窗而被截斷時:

    def handle_truncated_response(response):
        if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
            # Option 1: Warn the user about the specific limit
            if response.stop_reason == "max_tokens":
                message = "[Response truncated due to max_tokens limit]"
            else:
                message = "[Response truncated due to context window limit]"
            return f"{response.content[0].text}\n\n{message}"
    
            # Option 2: Continue generation
            messages = [
                {"role": "user", "content": original_prompt},
                {"role": "assistant", "content": response.content[0].text},
            ]
            continuation = client.messages.create(
                model="claude-opus-4-7",
                max_tokens=1024,
                messages=messages + [{"role": "user", "content": "Please continue"}],
            )
            return response.content[0].text + continuation.content[0].text

    3. 為 pause_turn 實現重試邏輯

    使用 server tools 時,如果伺服器端採樣迴圈達到其迭代限制(預設 10),API 可能會返回 pause_turn。通過繼續對話來處理這種情況:

    def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
        """
        Handle server tool conversations that may require multiple continuations.
    
        The server runs a sampling loop when executing server tools. If the loop
        reaches its iteration limit, the API returns pause_turn. Continue the
        conversation by sending the response back to let Claude finish.
        """
        messages = [{"role": "user", "content": user_query}]
    
        for _ in range(max_continuations):
            response = client.messages.create(
                model="claude-opus-4-7", messages=messages, tools=tools
            )
    
            if response.stop_reason != "pause_turn":
                # Claude finished processing - return the final response
                return response
    
            # pause_turn: replace the full message list to maintain alternating roles
            messages = [
                {"role": "user", "content": user_query},
                {"role": "assistant", "content": response.content},
            ]
    
        # Reached max continuations - return the last response
        return response

    停止原因與錯誤

    區分 stop_reason 值和實際錯誤很重要:

    停止原因(成功回應)

    • 回應正文的一部分
    • 指示生成正常停止的原因
    • 回應包含有效內容

    錯誤(失敗的請求)

    • HTTP 狀態碼 4xx 或 5xx
    • 指示請求處理失敗
    • 回應包含錯誤詳細信息
    Python
    import anthropic
    from anthropic import Anthropic
    
    client = Anthropic()
    
    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Hello!"}],
        )
    
        # Handle successful response with stop_reason
        if response.stop_reason == "max_tokens":
            print("Response was truncated")
    
    except anthropic.APIError as e:
        # Handle actual errors
        if e.status_code == 429:
            print("Rate limit exceeded")
        elif e.status_code == 500:
            print("Server error")

    串流考慮

    使用串流時,stop_reason 是:

    • 在初始 message_start 事件中為 null
    • 在 message_delta 事件中提供
    • 在任何其他事件中不提供
    Python
    from anthropic import Anthropic
    
    client = Anthropic()
    
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    ) as stream:
        for event in stream:
            if event.type == "message_delta":
                stop_reason = event.delta.stop_reason
                if stop_reason:
                    print(f"Stream ended with: {stop_reason}")

    常見模式

    處理工具使用工作流程

    使用工具執行器更簡單:下面的示例顯示手動工具處理。對於大多數用例,tool runner 會自動處理工具執行,代碼少得多。

    def complete_tool_workflow(client, user_query, tools):
        messages = [{"role": "user", "content": user_query}]
    
        while True:
            response = client.messages.create(
                model="claude-opus-4-7", messages=messages, tools=tools
            )
    
            if response.stop_reason == "tool_use":
                # Execute tools and continue
                tool_results = execute_tools(response.content)
                messages.append({"role": "assistant", "content": response.content})
                messages.append({"role": "user", "content": tool_results})
            else:
                # Final response
                return response

    確保完整回應

    def get_complete_response(client, prompt, max_attempts=3):
        messages = [{"role": "user", "content": prompt}]
        full_response = ""
    
        for _ in range(max_attempts):
            response = client.messages.create(
                model="claude-opus-4-7", messages=messages, max_tokens=4096
            )
    
            full_response += response.content[0].text
    
            if response.stop_reason != "max_tokens":
                break
    
            # Continue from where it left off
            messages = [
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": full_response},
                {"role": "user", "content": "Please continue from where you left off."},
            ]
    
        return full_response

    在不知道輸入大小的情況下獲取最大令牌

    使用 model_context_window_exceeded 停止原因,您可以請求最大可能的令牌,而無需計算輸入大小:

    def get_max_possible_tokens(client, prompt):
        """
        Get as many tokens as possible within the model's context window
        without needing to calculate input token count
        """
        response = client.messages.create(
            model="claude-opus-4-7",
            messages=[{"role": "user", "content": prompt}],
            max_tokens=64000,  # Practical non-streaming ceiling (Opus 4.7 supports 128K with streaming)
        )
    
        if response.stop_reason == "model_context_window_exceeded":
            # Got the maximum possible tokens given input size
            print(
                f"Generated {response.usage.output_tokens} tokens (context limit reached)"
            )
        elif response.stop_reason == "max_tokens":
            # Got exactly the requested tokens
            print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
        else:
            # Natural completion
            print(f"Generated {response.usage.output_tokens} tokens (natural completion)")
    
        return response.content[0].text

    通過正確處理 stop_reason 值,您可以構建更強大的應用程式,優雅地處理不同的回應場景並提供更好的用戶體驗。

    # Check if response was truncated during tool use
    if response.stop_reason == "max_tokens":
        # Check if the last content block is an incomplete tool_use
        last_block = response.content[-1]
        if last_block.type == "tool_use":
            # Send the request with higher max_tokens
            response = client.messages.create(
                model="claude-opus-4-7",
                max_tokens=4096,  # Increased limit
                messages=messages,
                tools=tools,
            )