Loading...
    • 构建
    • 管理
    • 模型和定价
    • 客户端 SDK
    • API 参考
    Search...
    ⌘K
    第一步
    Claude 简介快速开始
    使用 Claude 构建
    功能概览使用 Messages APIClaude API 技能处理停止原因
    模型能力
    扩展思考自适应思考工作量任务预算(测试版)快速模式(测试版:研究预览)结构化输出引用流式消息批量处理搜索结果流式拒绝多语言支持嵌入
    工具
    概览工具使用原理网络搜索工具网络获取工具代码执行工具顾问工具内存工具Bash 工具计算机使用工具文本编辑器工具
    工具基础设施
    工具参考工具搜索程序化工具调用细粒度工具流式传输
    上下文管理
    上下文窗口压缩上下文编辑提示缓存Token 计数
    处理文件
    Files APIPDF 支持图像和视觉
    技能
    概览快速开始最佳实践企业技能API 中的技能
    MCP
    远程 MCP 服务器MCP 连接器
    提示工程
    概览提示最佳实践Console 提示工具
    测试和评估
    定义成功并构建评估在 Console 中使用评估工具降低延迟
    加强防护栏
    减少幻觉提高输出一致性缓解越狱减少提示泄露
    资源
    术语表
    发布说明
    Claude Platform
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    使用 Claude 构建

    处理停止原因

    了解 Claude API 响应中的 stop_reason 字段,以及如何在应用中正确处理不同的停止原因。

    Was this page helpful?

    • stop_reason 字段
    • end_turn
    • max_tokens
    • stop_sequence
    • tool_use
    • pause_turn
    • refusal
    • model_context_window_exceeded
    • 1. 始终检查 stop_reason
    • 2. 优雅地处理截断的响应
    • 3. 为 pause_turn 实现重试逻辑

    当您向 Messages API 发送请求时,Claude 的响应包含一个 stop_reason 字段,该字段指示模型停止生成响应的原因。理解这些值对于构建能够适当处理不同响应类型的健壮应用至关重要。

    有关 API 响应中 stop_reason 的详细信息,请参阅 Messages API 参考。

    stop_reason 字段

    stop_reason 字段是每个成功的 Messages API 响应的一部分。与指示请求处理失败的错误不同,stop_reason 告诉您 Claude 为什么成功完成了其响应生成。

    Example response
    {
      "id": "msg_01234",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "Here's the answer to your question..."
        }
      ],
      "stop_reason": "end_turn",
      "stop_sequence": null,
      "usage": {
        "input_tokens": 100,
        "output_tokens": 50
      }
    }

    停止原因值

    end_turn

    最常见的停止原因。表示 Claude 自然完成了其响应。

    Python
    from anthropic import Anthropic
    
    client = Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    if response.stop_reason == "end_turn":
        # Process the complete response
        print(response.content[0].text)

    带有 end_turn 的空响应

    有时 Claude 会返回一个空响应(恰好 2-3 个令牌,没有内容),其中 stop_reason: "end_turn"。这通常发生在 Claude 认为助手轮次已完成时,特别是在工具结果之后。

    常见原因:

    • 在工具结果之后立即添加文本块(Claude 学会期望用户始终在工具结果之后插入文本,因此它结束其轮次以遵循该模式)
    • 发送 Claude 已完成的响应而不添加任何内容(Claude 已经决定完成,所以它将保持完成)

    如何防止空响应:

    # INCORRECT: Adding text immediately after tool_result
    messages = [
        {"role": "user", "content": "Calculate the sum of 1234 and 5678"},
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "toolu_123",
                    "name": "calculator",
                    "input": {"operation": "add", "a": 1234, "b": 5678},
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"},
                {
                    "type": "text",
                    "text": "Here's the result",  # Don't add text after tool_result
                },
            ],
        },
    ]
    
    # CORRECT: Send tool results directly without additional text
    messages = [
        {"role": "user", "content": "Calculate the sum of 1234 and 5678"},
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "toolu_123",
                    "name": "calculator",
                    "input": {"operation": "add", "a": 1234, "b": 5678},
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"}
            ],
        },  # Just the tool_result, no additional text
    ]
    
    
    # If you still get empty responses after fixing the above:
    def handle_empty_response(client, messages):
        response = client.messages.create(
            model="claude-opus-4-7", max_tokens=1024, messages=messages
        )
    
        # Check if response is empty
        if response.stop_reason == "end_turn" and not response.content:
            # INCORRECT: Don't just retry with the empty response
            # This won't work because Claude already decided it's done
    
            # CORRECT: Add a continuation prompt in a NEW user message
            messages.append({"role": "user", "content": "Please continue"})
    
            response = client.messages.create(
                model="claude-opus-4-7", max_tokens=1024, messages=messages
            )
    
        return response

    最佳实践:

    1. 永远不要在工具结果之后立即添加文本块 - 这会教导 Claude 期望在每次工具使用后进行用户输入
    2. 不要在没有修改的情况下重试空响应 - 仅仅发送空响应回去不会有帮助
    3. 使用续接提示作为最后手段 - 仅当上述修复不能解决问题时

    max_tokens

    Claude 停止是因为它达到了您请求中指定的 max_tokens 限制。

    Python
    # Request with limited tokens
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=10,
        messages=[{"role": "user", "content": "Explain quantum physics"}],
    )
    
    if response.stop_reason == "max_tokens":
        # Response was truncated
        print("Response was cut off at token limit")
        # Consider making another request to continue

    不完整的工具使用块

    如果 Claude 的响应因为达到 max_tokens 限制而被截断,并且截断的响应包含一个不完整的工具使用块,您需要使用更高的 max_tokens 值重试请求以获得完整的工具使用。

    stop_sequence

    Claude 遇到了您的一个自定义停止序列。

    Python
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        stop_sequences=["END", "STOP"],
        messages=[{"role": "user", "content": "Generate text until you say END"}],
    )
    
    if response.stop_reason == "stop_sequence":
        print(f"Stopped at sequence: {response.stop_sequence}")

    tool_use

    Claude 正在调用一个工具,并期望您执行它。

    对于大多数工具使用实现,我们建议使用 tool runner,它会自动处理工具执行、结果格式化和对话管理。

    Python
    from anthropic import Anthropic
    
    client = Anthropic()
    weather_tool = {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state"},
            },
            "required": ["location"],
        },
    }
    
    
    def execute_tool(name, tool_input):
        """Execute a tool and return the result."""
        return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"
    
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=[weather_tool],
        messages=[{"role": "user", "content": "What's the weather?"}],
    )
    
    if response.stop_reason == "tool_use":
        # Extract and execute the tool
        for content in response.content:
            if content.type == "tool_use":
                result = execute_tool(content.name, content.input)
                # Return result to Claude for final response

    pause_turn

    当执行 server tools(如网络搜索或网络获取)时,服务器端采样循环达到其迭代限制时返回。默认限制是每个请求 10 次迭代。

    当这种情况发生时,响应可能包含一个 server_tool_use 块,但没有相应的 server_tool_result。要让 Claude 完成处理,请通过按原样发送响应来继续对话。

    Python
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
        messages=[{"role": "user", "content": "Search for latest AI news"}],
    )
    
    if response.stop_reason == "pause_turn":
        # Continue the conversation by sending the response back
        messages = [
            {"role": "user", "content": original_query},
            {"role": "assistant", "content": response.content},
        ]
        continuation = client.messages.create(
            model="claude-opus-4-7",
            messages=messages,
            tools=[{"type": "web_search_20250305", "name": "web_search"}],
        )

    您的应用应该在任何使用 server tools 的代理循环中处理 pause_turn。只需将助手的响应添加到您的消息数组中,并发出另一个 API 请求以让 Claude 继续。

    refusal

    Claude 由于安全问题拒绝生成响应。

    Python
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=[{"role": "user", "content": "[Unsafe request]"}],
    )
    
    if response.stop_reason == "refusal":
        # Claude declined to respond
        print("Claude was unable to process this request")
        # Consider rephrasing or modifying the request

    如果在使用 Claude Sonnet 4.5 或 Opus 4.1 时频繁遇到 refusal 停止原因,您可以尝试更新您的 API 调用以使用 Haiku 4.5(claude-haiku-4-5-20251001),它具有不同的使用限制。了解更多关于 理解 Sonnet 4.5 的 API 安全过滤器。

    要了解更多关于 Claude Sonnet 4.5 的 API 安全过滤器触发的拒绝,请参阅 理解 Sonnet 4.5 的 API 安全过滤器。

    model_context_window_exceeded

    Claude 停止是因为它达到了模型的上下文窗口限制。这允许您请求最大可能的令牌,而无需知道确切的输入大小。

    Python
    # Request with maximum tokens to get as much as possible
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=64000,  # Practical non-streaming ceiling (Opus 4.7 supports 128K with streaming)
        messages=[
            {"role": "user", "content": "Large input that uses most of context window..."}
        ],
    )
    
    if response.stop_reason == "model_context_window_exceeded":
        # Response hit context window limit before max_tokens
        print("Response reached model's context window limit")
        # The response is still valid but was limited by context window

    此停止原因在 Sonnet 4.5 及更新的模型中默认可用。对于早期模型,使用 beta 标头 model-context-window-exceeded-2025-08-26 来启用此行为。

    处理停止原因的最佳实践

    1. 始终检查 stop_reason

    养成在响应处理逻辑中检查 stop_reason 的习惯:

    def handle_response(response):
        if response.stop_reason == "tool_use":
            return handle_tool_use(response)
        elif response.stop_reason == "max_tokens":
            return handle_truncation(response)
        elif response.stop_reason == "model_context_window_exceeded":
            return handle_context_limit(response)
        elif response.stop_reason == "pause_turn":
            return handle_pause(response)
        elif response.stop_reason == "refusal":
            return handle_refusal(response)
        else:
            # Handle end_turn and other cases
            return response.content[0].text

    2. 优雅地处理截断的响应

    当响应因令牌限制或上下文窗口而被截断时:

    def handle_truncated_response(response):
        if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
            # Option 1: Warn the user about the specific limit
            if response.stop_reason == "max_tokens":
                message = "[Response truncated due to max_tokens limit]"
            else:
                message = "[Response truncated due to context window limit]"
            return f"{response.content[0].text}\n\n{message}"
    
            # Option 2: Continue generation
            messages = [
                {"role": "user", "content": original_prompt},
                {"role": "assistant", "content": response.content[0].text},
            ]
            continuation = client.messages.create(
                model="claude-opus-4-7",
                max_tokens=1024,
                messages=messages + [{"role": "user", "content": "Please continue"}],
            )
            return response.content[0].text + continuation.content[0].text

    3. 为 pause_turn 实现重试逻辑

    当使用 server tools 时,如果服务器端采样循环达到其迭代限制(默认 10),API 可能会返回 pause_turn。通过继续对话来处理这种情况:

    def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
        """
        Handle server tool conversations that may require multiple continuations.
    
        The server runs a sampling loop when executing server tools. If the loop
        reaches its iteration limit, the API returns pause_turn. Continue the
        conversation by sending the response back to let Claude finish.
        """
        messages = [{"role": "user", "content": user_query}]
    
        for _ in range(max_continuations):
            response = client.messages.create(
                model="claude-opus-4-7", messages=messages, tools=tools
            )
    
            if response.stop_reason != "pause_turn":
                # Claude finished processing - return the final response
                return response
    
            # pause_turn: replace the full message list to maintain alternating roles
            messages = [
                {"role": "user", "content": user_query},
                {"role": "assistant", "content": response.content},
            ]
    
        # Reached max continuations - return the last response
        return response

    停止原因与错误

    区分 stop_reason 值和实际错误很重要:

    停止原因(成功响应)

    • 响应体的一部分
    • 指示生成正常停止的原因
    • 响应包含有效内容

    错误(失败的请求)

    • HTTP 状态码 4xx 或 5xx
    • 指示请求处理失败
    • 响应包含错误详情
    Python
    import anthropic
    from anthropic import Anthropic
    
    client = Anthropic()
    
    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": "Hello!"}],
        )
    
        # Handle successful response with stop_reason
        if response.stop_reason == "max_tokens":
            print("Response was truncated")
    
    except anthropic.APIError as e:
        # Handle actual errors
        if e.status_code == 429:
            print("Rate limit exceeded")
        elif e.status_code == 500:
            print("Server error")

    流式处理注意事项

    使用流式处理时,stop_reason 是:

    • 在初始 message_start 事件中为 null
    • 在 message_delta 事件中提供
    • 在任何其他事件中不提供
    Python
    from anthropic import Anthropic
    
    client = Anthropic()
    
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    ) as stream:
        for event in stream:
            if event.type == "message_delta":
                stop_reason = event.delta.stop_reason
                if stop_reason:
                    print(f"Stream ended with: {stop_reason}")

    常见模式

    处理工具使用工作流

    使用 tool runner 更简单:下面的示例显示手动工具处理。对于大多数用例,tool runner 会自动处理工具执行,代码少得多。

    def complete_tool_workflow(client, user_query, tools):
        messages = [{"role": "user", "content": user_query}]
    
        while True:
            response = client.messages.create(
                model="claude-opus-4-7", messages=messages, tools=tools
            )
    
            if response.stop_reason == "tool_use":
                # Execute tools and continue
                tool_results = execute_tools(response.content)
                messages.append({"role": "assistant", "content": response.content})
                messages.append({"role": "user", "content": tool_results})
            else:
                # Final response
                return response

    确保完整响应

    def get_complete_response(client, prompt, max_attempts=3):
        messages = [{"role": "user", "content": prompt}]
        full_response = ""
    
        for _ in range(max_attempts):
            response = client.messages.create(
                model="claude-opus-4-7", messages=messages, max_tokens=4096
            )
    
            full_response += response.content[0].text
    
            if response.stop_reason != "max_tokens":
                break
    
            # Continue from where it left off
            messages = [
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": full_response},
                {"role": "user", "content": "Please continue from where you left off."},
            ]
    
        return full_response

    在不知道输入大小的情况下获取最大令牌

    使用 model_context_window_exceeded 停止原因,您可以请求最大可能的令牌,而无需计算输入大小:

    def get_max_possible_tokens(client, prompt):
        """
        Get as many tokens as possible within the model's context window
        without needing to calculate input token count
        """
        response = client.messages.create(
            model="claude-opus-4-7",
            messages=[{"role": "user", "content": prompt}],
            max_tokens=64000,  # Practical non-streaming ceiling (Opus 4.7 supports 128K with streaming)
        )
    
        if response.stop_reason == "model_context_window_exceeded":
            # Got the maximum possible tokens given input size
            print(
                f"Generated {response.usage.output_tokens} tokens (context limit reached)"
            )
        elif response.stop_reason == "max_tokens":
            # Got exactly the requested tokens
            print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
        else:
            # Natural completion
            print(f"Generated {response.usage.output_tokens} tokens (natural completion)")
    
        return response.content[0].text

    通过正确处理 stop_reason 值,您可以构建更健壮的应用,优雅地处理不同的响应场景并提供更好的用户体验。

    # Check if response was truncated during tool use
    if response.stop_reason == "max_tokens":
        # Check if the last content block is an incomplete tool_use
        last_block = response.content[-1]
        if last_block.type == "tool_use":
            # Send the request with higher max_tokens
            response = client.messages.create(
                model="claude-opus-4-7",
                max_tokens=4096,  # Increased limit
                messages=messages,
                tools=tools,
            )