메시지Claude로 구축하기

중단 이유와 폴백

각 stop_reason 값의 의미와 애플리케이션에서 잘림, 도구 사용, 일시 중지된 턴, 거부를 처리하는 방법을 알아보세요.

모든 Messages API 응답에는 Claude가 생성을 중단한 이유를 알려주는 stop_reason 필드가 포함되어 있습니다. 이 필드를 확인하여 응답을 그대로 사용할지, 대화를 계속할지, 재시도할지, 아니면 다른 모델로 폴백할지 결정하세요.

전체 응답 스키마는 Messages API 레퍼런스를 참조하세요.

빠른 참조

값	발생 시점	대처 방법
`end_turn`	Claude가 응답을 자연스럽게 완료했습니다.	응답을 사용하세요.
`max_tokens`	응답이 `max_tokens` 제한에 도달했습니다.	`max_tokens`를 늘리거나 응답을 계속하세요.
`stop_sequence`	Claude가 `stop_sequences` 중 하나를 출력했습니다.	`stop_sequence`를 읽어 어떤 것이 발동했는지 확인하세요.
`tool_use`	Claude가 도구를 호출하고 있습니다.	도구를 실행하고 결과를 반환하세요. 결과 블록이 아직 없는 서버 도구 호출은 이후 응답에서 완료됩니다.
`pause_turn`	서버 도구 루프가 반복 제한에 도달했습니다.	어시스턴트 콘텐츠를 다시 보내 계속하세요.
`refusal`	Claude가 응답을 거부했습니다.	`stop_details`를 읽고 폴백 모델에서 재시도하세요.
`model_context_window_exceeded`	응답이 모델의 컨텍스트 윈도우를 채웠습니다.	응답을 잘린 것으로 처리하세요.

stop_reason 필드

stop_reason 필드는 모든 성공적인 Messages API 응답의 일부입니다. 요청 처리 실패를 나타내는 오류와 달리, stop_reason은 Claude가 응답 생성을 완료한 이유를 알려줍니다.

Example response

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "stop_details": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

중단 이유 값

end_turn

가장 일반적인 중단 이유입니다. Claude가 응답을 자연스럽게 완료했음을 나타냅니다.

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 전체 응답을 처리합니다
    for block in response.content:
        if block.type == "text":
            print(block.text)

max_tokens

Claude가 요청에 지정된 max_tokens 제한에 도달하여 중단되었습니다.

client = anthropic.Anthropic()
# 제한된 토큰으로 요청
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # 응답이 잘렸습니다
    print("Response was cut off at token limit")
    # 계속하려면 다른 요청을 보내는 것을 고려하세요

stop_sequence

Claude가 사용자 정의 중단 시퀀스 중 하나를 만났습니다.

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claude가 도구를 호출하고 있으며 사용자가 이를 실행하기를 기대합니다.

대부분의 도구 사용 구현에는 도구 실행, 결과 포맷팅, 대화 관리를 자동으로 처리하는 도구 러너를 사용하세요.

client = anthropic.Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """Execute a tool and return the result."""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)

if response.stop_reason == "tool_use":
    # 도구를 추출하고 실행합니다
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # 최종 응답을 위해 결과를 Claude에 반환합니다

tool_use 응답에는 id에 대응하는 결과 블록이 없는 server_tool_use 블록이 포함될 수도 있습니다. 해당 서버 도구 호출은 완료되지 않았으며, 이 응답에는 그 결과가 포함되어 있지 않습니다. 일반적인 경우, Claude는 서버 도구와 클라이언트 도구 중 하나를 동일한 병렬 도구 호출 그룹에서 호출합니다. API는 클라이언트 도구를 먼저 실행할 수 있도록 서버 도구를 실행하지 않고 반환합니다. 이 상태를 나타내는 다른 표시는 없으며, 각 server_tool_use 또는 mcp_tool_use 블록의 id에 대응하는 결과 블록이 있는지 확인하여 감지하세요.

프로그래밍 방식 도구 호출에서는 동일한 응답 형태가 다른 의미를 가집니다. 클라이언트 tool_use 블록은 Claude가 직접 생성한 것이 아니라 code_execution 도구에서 실행 중인 코드에서 나온 것이며, 그 caller 필드는 이를 호출한 code_execution 블록을 지정합니다. 해당 코드는 이미 시작되었습니다. 즉, tool_result 블록을 기다리며 일시 중지된 상태이고, 이를 보내면 지연된 도구를 시작하는 대신 실행이 재개됩니다. code_execution 블록 자체의 결과 블록은 코드가 완료되면 도착하며, 이는 한 번 이상의 도구 결과 라운드가 필요할 수 있습니다. 후속 사용자 메시지 자체는 두 경우 모두 동일합니다. 프로그래밍 방식 도구 호출에서는 해당 페이지에서 보여주듯이 응답의 container 필드에서 id도 함께 전달하세요.

A mixed tool_use response

{
  "stop_reason": "tool_use",
  "content": [
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01HxbWnMRmbWyMfUtJKC45rA",
      "name": "web_search",
      "input": { "query": "example article" }
    },
    {
      "type": "tool_use",
      "id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "name": "run_command",
      "input": { "command": "uname -a" }
    }
  ]
}

계속 진행은 응답의 모든 tool_use 블록에 대해 하나씩 tool_result 블록으로 구성된 사용자 메시지입니다(도구 호출 처리 참조). 여기에는 두 가지 추가 규칙이 있습니다. 해당 메시지에는 tool_result 블록 외에 아무것도 포함되어서는 안 되며, 요청은 동일한 tools 배열을 유지해야 합니다. 대기 중인 서버 도구를 더 이상 정의하지 않는 재개 요청은 메시지가 but no `web_search` tool was provided로 끝나는 400 오류로 실패합니다. API는 결과를 아직 열려 있는 어시스턴트 턴에 첨부하고, 지연된 서버 도구를 실행하며(일시 중지된 코드 실행의 경우 재개), 턴을 계속합니다. Claude가 직접 호출한 서버 도구의 경우, 다음 응답의 content는 이전 응답의 server_tool_use id에 대응하는 결과 블록으로 시작합니다.

The follow-up user message

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "content": "Linux demo-host 6.8.0-52-generic x86_64 GNU/Linux"
    }
  ]
}

해당 사용자 메시지에서 tool_result 블록 뒤에 텍스트와 같은 것을 추가하면 어시스턴트 턴이 종료됩니다. Claude가 직접 호출한 서버 도구의 경우, 요청은 해결되지 않은 서버 도구를 명시하는 400 invalid_request_error로 실패합니다:

`web_search` tool use with id `srvtoolu_01HxbWnMRmbWyMfUtJKC45rA` was found without a corresponding `web_search_tool_result` block

tool_result를 생략하거나 다른 콘텐츠 뒤에 배치하면 대신 표준 tool_use ids were found without tool_result blocks immediately after 오류로 더 일찍 실패합니다. Claude에게 추가 입력을 제공하려면 턴이 완료된 후 별도의 사용자 메시지로 보내세요.

pause_turn

웹 검색과 같은 서버 도구를 실행하는 동안 서버 측 샘플링 루프가 반복 제한에 도달하면 반환됩니다. 기본 제한은 요청당 10회 반복입니다.

이 경우 응답에는 대응하는 결과 블록이 없는 server_tool_use 블록이 포함될 수 있습니다. Claude가 처리를 완료하도록 하려면 응답을 그대로 다시 보내 대화를 계속하세요. 클라이언트 tool_use 블록이 사용자를 기다리고 있는 응답은 절대 stop_reason이 pause_turn이 되지 않습니다. Claude가 도구를 호출하기 위해 중단하면 stop_reason은 tool_use이며, 응답 자체 대신 클라이언트 tool_result 블록을 보내 계속합니다.

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=4096,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # 응답을 다시 보내서 대화를 계속합니다
    messages = [
        {"role": "user", "content": "Search for latest AI news"},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-5",
        max_tokens=4096,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )

애플리케이션은 서버 도구를 사용하는 모든 에이전트 루프에서 pause_turn을 처리해야 합니다. 어시스턴트의 응답을 메시지 배열에 추가하고 다른 API 요청을 보내 Claude가 계속하도록 하세요.

refusal

Claude가 응답 생성을 거부했습니다. 안전 분류기는 이 중단 이유를 오류가 아닌 정상적인 HTTP 200 응답으로 반환합니다.

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claude가 응답을 거부했습니다
    print("Claude was unable to process this request")
    # 요청을 다시 표현하거나 수정하는 것을 고려하세요

Claude Sonnet 4.5 또는 Opus 4.1(지원 중단됨; 모델 지원 중단 참조)을 사용하는 동안 refusal 중단 이유가 자주 발생한다면, 사용 제한이 다른 Haiku 4.5(claude-haiku-4-5-20251001)를 사용하도록 API 호출을 업데이트해 볼 수 있습니다. Sonnet 4.5의 API 안전 필터 이해하기에서 자세히 알아보세요.

거부 시 stop_details 객체는 이를 유발한 정책 카테고리를 식별합니다. 카테고리와 전체 거부 응답 형태는 거부와 폴백에서 다룹니다. stop_details는 refusal 이외의 모든 중단 이유에 대해 null입니다.

Claude Fable 5 또는 Claude Opus 5에서 거부된 요청은 일반적으로 다른 Claude 모델에서 재시도하여 처리할 수 있으며, 거부와 폴백에서 서버 측 또는 클라이언트에서 해당 재시도를 설정하는 방법을 보여줍니다. 폴백 크레딧은 재시도를 직접 구축할 때 프롬프트 캐시 비용을 두 번 지불하지 않는 방법을 다룹니다.

model_context_window_exceeded

Claude가 모델의 컨텍스트 윈도우 제한에 도달하여 중단되었습니다. 이를 통해 정확한 입력 크기를 알지 못해도 가능한 최대 토큰을 요청할 수 있습니다.

이 중단 이유는 현재 SDK의 beta 네임스페이스에서만 타입이 정의되어 있으므로, 다음 예제는 client.beta.messages를 호출하고 Beta 접두사가 붙은 타입을 사용합니다. Sonnet 4.5 및 최신 모델에서는 API가 베타 헤더 없이 이 값을 반환합니다. 이전 모델의 경우 model-context-window-exceeded-2025-08-26 베타 헤더를 추가하여 활성화하세요.

# 가능한 한 많이 얻기 위해 최대 토큰으로 요청
response = client.beta.messages.create(
    model="claude-opus-5",
    max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # 응답이 max_tokens에 도달하기 전에 컨텍스트 윈도우 제한에 도달함
    print("Response reached model's context window limit")
    # 응답은 여전히 유효하지만 컨텍스트 윈도우에 의해 제한됨

중단 이유 처리 모범 사례

항상 stop_reason 확인하기

응답 처리 로직에서 stop_reason을 확인하는 습관을 들이세요:

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # end_turn 및 기타 경우 처리
        return next(
            (block.text for block in response.content if block.type == "text"), ""
        )

잘린 응답을 우아하게 처리하기

토큰 제한이나 컨텍스트 윈도우로 인해 응답이 잘린 경우, 출력이 불완전하다는 것을 독자가 알 수 있도록 알림을 추가하세요. 대신 응답이 중단된 지점부터 계속 생성하려면 완전한 응답 보장하기를 참조하세요.

def handle_truncated_response(response):
    text = next((block.text for block in response.content if block.type == "text"), "")
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        if response.stop_reason == "max_tokens":
            note = "[Response truncated due to max_tokens limit]"
        else:
            note = "[Response truncated due to context window limit]"
        return f"{text}\n\n{note}"
    return text

pause_turn에 대한 재시도 로직 구현하기

서버 도구를 사용할 때, 서버 측 샘플링 루프가 반복 제한(기본값 10)에 도달하면 API가 pause_turn을 반환할 수 있습니다. 대화를 계속하여 이를 처리하세요:

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    Handle server tool conversations that may require multiple continuations.

    The server runs a sampling loop when executing server tools. If the loop
    reaches its iteration limit, the API returns pause_turn. Continue the
    conversation by sending the response back to let Claude finish.
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-5", max_tokens=4096, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claude가 처리를 완료함 - 최종 응답을 반환합니다
            return response

        # pause_turn: 역할 교대를 유지하기 위해 전체 메시지 목록을 교체합니다
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 최대 연속 횟수에 도달함 - 마지막 응답을 반환합니다
    return response

중단 이유와 오류의 차이

stop_reason 값과 실제 오류를 구분하는 것이 중요합니다:

중단 이유 (성공적인 응답)

응답 본문의 일부
생성이 정상적으로 중단된 이유를 나타냄
응답에 유효한 콘텐츠가 포함됨

오류 (실패한 요청)

HTTP 상태 코드 4xx 또는 5xx
요청 처리 실패를 나타냄
응답에 오류 세부 정보가 포함됨

client = anthropic.Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # stop_reason이 있는 성공 응답 처리
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 실제 오류 처리
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

스트리밍 고려 사항

스트리밍을 사용할 때 stop_reason은:

초기 message_start 이벤트에서는 null
message_delta 이벤트에서 제공됨
다른 이벤트에서는 제공되지 않음

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

일반적인 패턴

도구 사용 워크플로 처리하기

도구 러너로 더 간단하게: 다음 예제는 수동 도구 처리를 보여줍니다. 대부분의 사용 사례에서는 도구 러너가 훨씬 적은 코드로 도구 실행을 자동으로 처리합니다.

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-5", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # 도구를 실행하고 계속 진행
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 최종 응답
            return response

완전한 응답 보장하기

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-5", messages=messages, max_tokens=4096
        )

        full_response += next(
            (block.text for block in response.content if block.type == "text"), ""
        )

        if response.stop_reason != "max_tokens":
            break

        # 중단된 지점부터 계속
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

입력 크기를 모르는 상태에서 최대 토큰 얻기

model_context_window_exceeded 중단 이유를 사용하면 입력 크기를 계산하지 않고도 가능한 최대 토큰을 요청할 수 있습니다:

def get_max_possible_tokens(client, prompt):
    """
    Get as many tokens as possible within the model's context window
    without needing to calculate input token count
    """
    response = client.beta.messages.create(
        model="claude-opus-5",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 입력 크기를 고려할 때 가능한 최대 토큰 수를 받았습니다
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 요청한 토큰 수를 정확히 받았습니다
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 자연스러운 완료
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return next((block.text for block in response.content if block.type == "text"), "")

다음 단계

거부와 폴백

거부된 요청을 서버 측 또는 클라이언트에서 폴백 모델로 재시도하세요.

도구 러너 (SDK)

SDK가 tool_use 루프, 결과 포맷팅, 재시도를 대신 관리하도록 하세요.

메시지 스트리밍

스트리밍 시 message_delta 이벤트에서 stop_reason을 읽으세요.

오류

중단 이유와는 구별되는 4xx 및 5xx HTTP 오류를 처리하세요.

Was this page helpful?

메시지Claude로 구축하기

중단 이유와 폴백

각 stop_reason 값의 의미와 애플리케이션에서 잘림, 도구 사용, 일시 중지된 턴, 거부를 처리하는 방법을 알아보세요.

전체 응답 스키마는 Messages API 레퍼런스를 참조하세요.

빠른 참조

값	발생 시점	대처 방법
`end_turn`	Claude가 응답을 자연스럽게 완료했습니다.	응답을 사용하세요.
`max_tokens`	응답이 `max_tokens` 제한에 도달했습니다.	`max_tokens`를 늘리거나 응답을 계속하세요.
`stop_sequence`	Claude가 `stop_sequences` 중 하나를 출력했습니다.	`stop_sequence`를 읽어 어떤 것이 발동했는지 확인하세요.
`tool_use`	Claude가 도구를 호출하고 있습니다.	도구를 실행하고 결과를 반환하세요. 결과 블록이 아직 없는 서버 도구 호출은 이후 응답에서 완료됩니다.
`pause_turn`	서버 도구 루프가 반복 제한에 도달했습니다.	어시스턴트 콘텐츠를 다시 보내 계속하세요.
`refusal`	Claude가 응답을 거부했습니다.	`stop_details`를 읽고 폴백 모델에서 재시도하세요.
`model_context_window_exceeded`	응답이 모델의 컨텍스트 윈도우를 채웠습니다.	응답을 잘린 것으로 처리하세요.

stop_reason 필드

Example response

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "stop_details": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

중단 이유 값

end_turn

가장 일반적인 중단 이유입니다. Claude가 응답을 자연스럽게 완료했음을 나타냅니다.

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 전체 응답을 처리합니다
    for block in response.content:
        if block.type == "text":
            print(block.text)

max_tokens

Claude가 요청에 지정된 max_tokens 제한에 도달하여 중단되었습니다.

client = anthropic.Anthropic()
# 제한된 토큰으로 요청
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # 응답이 잘렸습니다
    print("Response was cut off at token limit")
    # 계속하려면 다른 요청을 보내는 것을 고려하세요

stop_sequence

Claude가 사용자 정의 중단 시퀀스 중 하나를 만났습니다.

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claude가 도구를 호출하고 있으며 사용자가 이를 실행하기를 기대합니다.

대부분의 도구 사용 구현에는 도구 실행, 결과 포맷팅, 대화 관리를 자동으로 처리하는 도구 러너를 사용하세요.

client = anthropic.Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """Execute a tool and return the result."""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)

if response.stop_reason == "tool_use":
    # 도구를 추출하고 실행합니다
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # 최종 응답을 위해 결과를 Claude에 반환합니다

A mixed tool_use response

{
  "stop_reason": "tool_use",
  "content": [
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01HxbWnMRmbWyMfUtJKC45rA",
      "name": "web_search",
      "input": { "query": "example article" }
    },
    {
      "type": "tool_use",
      "id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "name": "run_command",
      "input": { "command": "uname -a" }
    }
  ]
}

The follow-up user message

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "content": "Linux demo-host 6.8.0-52-generic x86_64 GNU/Linux"
    }
  ]
}

`web_search` tool use with id `srvtoolu_01HxbWnMRmbWyMfUtJKC45rA` was found without a corresponding `web_search_tool_result` block

pause_turn

웹 검색과 같은 서버 도구를 실행하는 동안 서버 측 샘플링 루프가 반복 제한에 도달하면 반환됩니다. 기본 제한은 요청당 10회 반복입니다.

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=4096,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # 응답을 다시 보내서 대화를 계속합니다
    messages = [
        {"role": "user", "content": "Search for latest AI news"},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-5",
        max_tokens=4096,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )

refusal

Claude가 응답 생성을 거부했습니다. 안전 분류기는 이 중단 이유를 오류가 아닌 정상적인 HTTP 200 응답으로 반환합니다.

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claude가 응답을 거부했습니다
    print("Claude was unable to process this request")
    # 요청을 다시 표현하거나 수정하는 것을 고려하세요

model_context_window_exceeded

# 가능한 한 많이 얻기 위해 최대 토큰으로 요청
response = client.beta.messages.create(
    model="claude-opus-5",
    max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # 응답이 max_tokens에 도달하기 전에 컨텍스트 윈도우 제한에 도달함
    print("Response reached model's context window limit")
    # 응답은 여전히 유효하지만 컨텍스트 윈도우에 의해 제한됨

중단 이유 처리 모범 사례

항상 stop_reason 확인하기

응답 처리 로직에서 stop_reason을 확인하는 습관을 들이세요:

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # end_turn 및 기타 경우 처리
        return next(
            (block.text for block in response.content if block.type == "text"), ""
        )

잘린 응답을 우아하게 처리하기

def handle_truncated_response(response):
    text = next((block.text for block in response.content if block.type == "text"), "")
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        if response.stop_reason == "max_tokens":
            note = "[Response truncated due to max_tokens limit]"
        else:
            note = "[Response truncated due to context window limit]"
        return f"{text}\n\n{note}"
    return text

pause_turn에 대한 재시도 로직 구현하기

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    Handle server tool conversations that may require multiple continuations.

    The server runs a sampling loop when executing server tools. If the loop
    reaches its iteration limit, the API returns pause_turn. Continue the
    conversation by sending the response back to let Claude finish.
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-5", max_tokens=4096, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claude가 처리를 완료함 - 최종 응답을 반환합니다
            return response

        # pause_turn: 역할 교대를 유지하기 위해 전체 메시지 목록을 교체합니다
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 최대 연속 횟수에 도달함 - 마지막 응답을 반환합니다
    return response

중단 이유와 오류의 차이

stop_reason 값과 실제 오류를 구분하는 것이 중요합니다:

중단 이유 (성공적인 응답)

응답 본문의 일부
생성이 정상적으로 중단된 이유를 나타냄
응답에 유효한 콘텐츠가 포함됨

오류 (실패한 요청)

HTTP 상태 코드 4xx 또는 5xx
요청 처리 실패를 나타냄
응답에 오류 세부 정보가 포함됨

client = anthropic.Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # stop_reason이 있는 성공 응답 처리
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 실제 오류 처리
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

스트리밍 고려 사항

스트리밍을 사용할 때 stop_reason은:

초기 message_start 이벤트에서는 null
message_delta 이벤트에서 제공됨
다른 이벤트에서는 제공되지 않음

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

일반적인 패턴

도구 사용 워크플로 처리하기

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-5", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # 도구를 실행하고 계속 진행
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 최종 응답
            return response

완전한 응답 보장하기

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-5", messages=messages, max_tokens=4096
        )

        full_response += next(
            (block.text for block in response.content if block.type == "text"), ""
        )

        if response.stop_reason != "max_tokens":
            break

        # 중단된 지점부터 계속
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

입력 크기를 모르는 상태에서 최대 토큰 얻기

model_context_window_exceeded 중단 이유를 사용하면 입력 크기를 계산하지 않고도 가능한 최대 토큰을 요청할 수 있습니다:

def get_max_possible_tokens(client, prompt):
    """
    Get as many tokens as possible within the model's context window
    without needing to calculate input token count
    """
    response = client.beta.messages.create(
        model="claude-opus-5",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 입력 크기를 고려할 때 가능한 최대 토큰 수를 받았습니다
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 요청한 토큰 수를 정확히 받았습니다
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 자연스러운 완료
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return next((block.text for block in response.content if block.type == "text"), "")

다음 단계

거부와 폴백

거부된 요청을 서버 측 또는 클라이언트에서 폴백 모델로 재시도하세요.

도구 러너 (SDK)

SDK가 tool_use 루프, 결과 포맷팅, 재시도를 대신 관리하도록 하세요.

메시지 스트리밍

스트리밍 시 message_delta 이벤트에서 stop_reason을 읽으세요.

오류

중단 이유와는 구별되는 4xx 및 5xx HTTP 오류를 처리하세요.

Was this page helpful?

빠른 참조

stop_reason 필드

중단 이유 값

end_turn

end_turn과 함께 빈 응답이 반환되는 경우

max_tokens

불완전한 도구 사용 블록

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

중단 이유 처리 모범 사례

항상 stop_reason 확인하기

잘린 응답을 우아하게 처리하기

pause_turn에 대한 재시도 로직 구현하기

중단 이유와 오류의 차이

중단 이유 (성공적인 응답)

오류 (실패한 요청)

스트리밍 고려 사항

일반적인 패턴

도구 사용 워크플로 처리하기

완전한 응답 보장하기

입력 크기를 모르는 상태에서 최대 토큰 얻기

다음 단계

빠른 참조

stop_reason 필드

중단 이유 값

end_turn

end_turn과 함께 빈 응답이 반환되는 경우

max_tokens

불완전한 도구 사용 블록

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

중단 이유 처리 모범 사례

항상 stop_reason 확인하기

잘린 응답을 우아하게 처리하기

pause_turn에 대한 재시도 로직 구현하기

중단 이유와 오류의 차이

중단 이유 (성공적인 응답)

오류 (실패한 요청)

스트리밍 고려 사항

일반적인 패턴

도구 사용 워크플로 처리하기

완전한 응답 보장하기

입력 크기를 모르는 상태에서 최대 토큰 얻기

다음 단계

빠른 참조

stop_reason 필드

중단 이유 값

end_turn

max_tokens

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

중단 이유 처리 모범 사례

항상 stop_reason 확인하기

잘린 응답을 우아하게 처리하기

pause_turn에 대한 재시도 로직 구현하기

중단 이유와 오류의 차이

중단 이유 (성공적인 응답)

오류 (실패한 요청)

스트리밍 고려 사항

일반적인 패턴

도구 사용 워크플로 처리하기

완전한 응답 보장하기

입력 크기를 모르는 상태에서 최대 토큰 얻기

다음 단계

빠른 참조

stop_reason 필드

중단 이유 값

end_turn

max_tokens

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

중단 이유 처리 모범 사례

항상 stop_reason 확인하기

잘린 응답을 우아하게 처리하기

pause_turn에 대한 재시도 로직 구현하기

중단 이유와 오류의 차이

중단 이유 (성공적인 응답)

오류 (실패한 요청)

스트리밍 고려 사항

일반적인 패턴

도구 사용 워크플로 처리하기

완전한 응답 보장하기

입력 크기를 모르는 상태에서 최대 토큰 얻기

다음 단계