メッセージClaudeで構築する

停止理由とフォールバック

各stop_reason値の意味と、アプリケーションでの切り捨て、ツール使用、一時停止されたターン、拒否の処理方法を学びます。

すべてのMessages APIレスポンスには、Claudeが生成を停止した理由を示すstop_reasonフィールドが含まれています。このフィールドを確認して、レスポンスをそのまま使用するか、会話を続けるか、再試行するか、別のモデルにフォールバックするかを判断してください。

完全なレスポンススキーマについては、Messages APIリファレンスを参照してください。

クイックリファレンス

値	発生するタイミング	対処方法
`end_turn`	Claudeが自然にレスポンスを完了した。	レスポンスを使用します。
`max_tokens`	レスポンスが`max_tokens`の制限に達した。	`max_tokens`を増やすか、レスポンスを継続します。
`stop_sequence`	Claudeが`stop_sequences`のいずれかを出力した。	`stop_sequence`を読んで、どれが発動したかを確認します。
`tool_use`	Claudeがツールを呼び出している。	ツールを実行して結果を返します。結果ブロックがまだないサーバーツール呼び出しは、後のレスポンスで完了します。
`pause_turn`	サーバーツールのループが反復回数の上限に達した。	アシスタントのコンテンツを送り返して継続します。
`refusal`	Claudeが応答を拒否した。	`stop_details`を読み、フォールバックモデルで再試行します。
`model_context_window_exceeded`	レスポンスがモデルのコンテキストウィンドウを埋め尽くした。	レスポンスを切り捨てられたものとして扱います。

stop_reasonフィールド

stop_reasonフィールドは、成功したすべてのMessages APIレスポンスの一部です。リクエストの処理における失敗を示すエラーとは異なり、stop_reasonはClaudeがレスポンスの生成を完了した理由を示します。

Example response

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "stop_details": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

停止理由の値

end_turn

最も一般的な停止理由です。Claudeが自然にレスポンスを完了したことを示します。

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 完全なレスポンスを処理します
    for block in response.content:
        if block.type == "text":
            print(block.text)

max_tokens

リクエストで指定されたmax_tokensの制限に達したため、Claudeが停止しました。

client = anthropic.Anthropic()
# トークン数を制限したリクエスト
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # レスポンスが途中で打ち切られました
    print("Response was cut off at token limit")
    # 続きを取得するには再度リクエストすることを検討してください

stop_sequence

Claudeがカスタム停止シーケンスのいずれかに遭遇しました。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claudeがツールを呼び出しており、あなたがそれを実行することを期待しています。

ほとんどのツール使用の実装では、ツールの実行、結果のフォーマット、会話の管理を自動的に処理するツールランナーを使用してください。

client = anthropic.Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """Execute a tool and return the result."""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)

if response.stop_reason == "tool_use":
    # ツールを抽出して実行する
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # 最終応答のために結果をClaudeに返す

tool_useレスポンスには、対応する結果ブロックを持たないidを持つserver_tool_useブロックが含まれることもあります。そのサーバーツール呼び出しは完了しておらず、このレスポンスにはその結果が含まれていません。一般的なケースでは、Claudeはサーバーツールとクライアントツールのいずれかを同じ並列ツール呼び出しのグループ内で呼び出します。APIはサーバーツールを実行せずに返すため、先にクライアントツールを実行できます。この状態を示す他のマーカーはありません。各server_tool_useまたはmcp_tool_useブロックのidに対応する結果ブロックがあるかどうかを確認して検出してください。

プログラマティックツール呼び出しでは、同じレスポンスの形状が異なる意味を持ちます。クライアントのtool_useブロックは、Claudeから直接ではなく、code_executionツール内で実行されているコードから来ており、そのcallerフィールドはそれを呼び出したcode_executionブロックを示します。そのコードはすでに開始されています。tool_resultブロックを待って一時停止しており、それらを送信すると、遅延されたツールを開始するのではなく、実行が再開されます。code_executionブロック自体の結果ブロックは、コードが完了すると到着しますが、これには複数回のツール結果のラウンドが必要になる場合があります。フォローアップのユーザーメッセージ自体はどちらの場合も同じです。プログラマティックツール呼び出しでは、そのページに示されているように、レスポンスのcontainerフィールドのidも返してください。

A mixed tool_use response

{
  "stop_reason": "tool_use",
  "content": [
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01HxbWnMRmbWyMfUtJKC45rA",
      "name": "web_search",
      "input": { "query": "example article" }
    },
    {
      "type": "tool_use",
      "id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "name": "run_command",
      "input": { "command": "uname -a" }
    }
  ]
}

継続は、レスポンス内のすべてのtool_useブロックに対して1つずつのtool_resultブロックからなるユーザーメッセージです（ツール呼び出しの処理を参照）。ただし、2つの追加ルールがあります。そのメッセージにはtool_resultブロック以外を含めてはならず、リクエストは同じtools配列を保持する必要があります。待機中のサーバーツールを定義しなくなった再開リクエストは、メッセージがbut no `web_search` tool was providedで終わる400エラーで失敗します。APIはあなたの結果をまだ開いているアシスタントのターンに添付し、遅延されたサーバーツールを実行し（一時停止されたコード実行の場合は再開し）、ターンを継続します。Claudeが直接呼び出したサーバーツールの場合、次のレスポンスのcontentは、前のレスポンスのserver_tool_useのidに応答する結果ブロックから始まります。

The follow-up user message

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "content": "Linux demo-host 6.8.0-52-generic x86_64 GNU/Linux"
    }
  ]
}

そのユーザーメッセージのtool_resultブロックの後にテキストなど何かを追加すると、アシスタントのターンが終了します。Claudeが直接呼び出したサーバーツールの場合、リクエストは未解決のサーバーツールを示す400 invalid_request_errorで失敗します:

`web_search` tool use with id `srvtoolu_01HxbWnMRmbWyMfUtJKC45rA` was found without a corresponding `web_search_tool_result` block

tool_resultを省略したり、他のコンテンツの後に配置したりすると、代わりに標準のtool_use ids were found without tool_result blocks immediately afterエラーでより早く失敗します。Claudeにさらに入力を与えるには、ターンが完了した後に別のユーザーメッセージとして送信してください。

pause_turn

Web検索などのサーバーツールを実行中に、サーバー側のサンプリングループが反復回数の上限に達したときに返されます。デフォルトの上限はリクエストあたり10回の反復です。

これが発生すると、レスポンスには対応する結果ブロックのないserver_tool_useブロックが含まれる場合があります。Claudeに処理を完了させるには、レスポンスをそのまま送り返して会話を継続してください。クライアントのtool_useブロックがあなたの対応を待っている状態のレスポンスがpause_turnのstop_reasonを持つことは決してありません。Claudeがあなたのツールを呼び出すために停止した場合、stop_reasonはtool_useであり、レスポンス自体ではなくクライアントのtool_resultブロックを送信して継続します。

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=4096,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # レスポンスを送り返して会話を続けます
    messages = [
        {"role": "user", "content": "Search for latest AI news"},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-5",
        max_tokens=4096,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )

アプリケーションは、サーバーツールを使用するすべてのエージェントループでpause_turnを処理する必要があります。アシスタントのレスポンスをメッセージ配列に追加し、別のAPIリクエストを行ってClaudeに継続させてください。

refusal

Claudeがレスポンスの生成を拒否しました。安全性分類器は、この停止理由をエラーではなく通常のHTTP 200レスポンスとして返します。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claudeが応答を拒否しました
    print("Claude was unable to process this request")
    # リクエストの言い換えや修正を検討してください

Claude Sonnet 4.5またはOpus 4.1（非推奨。モデルの非推奨化を参照）の使用中にrefusal停止理由が頻繁に発生する場合は、異なる使用制限を持つHaiku 4.5（claude-haiku-4-5-20251001）を使用するようにAPI呼び出しを更新してみてください。詳細については、Sonnet 4.5のAPI安全フィルターの理解を参照してください。

拒否の場合、stop_detailsオブジェクトはそれをトリガーしたポリシーカテゴリを特定します。カテゴリと完全な拒否レスポンスの形状については、拒否とフォールバックで説明されています。stop_detailsは、refusal以外のすべての停止理由ではnullです。

Claude Fable 5またはClaude Opus 5で拒否されたリクエストは、通常、別のClaudeモデルで再試行することで処理できます。拒否とフォールバックでは、サーバー側またはクライアント側でその再試行を設定する方法を示しています。フォールバッククレジットでは、自分で再試行を構築する際にプロンプトキャッシュのコストを二重に支払うことを避ける方法について説明しています。

model_context_window_exceeded

Claudeがモデルのコンテキストウィンドウの制限に達したため停止しました。これにより、正確な入力サイズを知らなくても、可能な限り最大のトークンをリクエストできます。

この停止理由は現在、SDKのbeta名前空間でのみ型定義されているため、以下の例ではclient.beta.messagesを呼び出し、Betaプレフィックス付きの型を使用しています。Sonnet 4.5以降のモデルでは、APIはベータヘッダーなしでこの値を返します。それ以前のモデルでは、model-context-window-exceeded-2025-08-26ベータヘッダーを追加して有効にしてください。

# できるだけ多く取得するために最大トークン数でリクエスト
response = client.beta.messages.create(
    model="claude-opus-5",
    max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # レスポンスがmax_tokensより先にコンテキストウィンドウの上限に到達
    print("Response reached model's context window limit")
    # レスポンスは有効ですが、コンテキストウィンドウにより制限されました

停止理由を処理するためのベストプラクティス

常にstop_reasonを確認する

レスポンス処理ロジックでstop_reasonを確認することを習慣にしてください:

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # end_turnやその他のケースを処理
        return next(
            (block.text for block in response.content if block.type == "text"), ""
        )

切り捨てられたレスポンスを適切に処理する

トークン制限またはコンテキストウィンドウのためにレスポンスが切り捨てられた場合、出力が不完全であることを読者が分かるように通知を追加してください。代わりにレスポンスが途切れたところから生成を継続するには、完全なレスポンスの確保を参照してください。

def handle_truncated_response(response):
    text = next((block.text for block in response.content if block.type == "text"), "")
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        if response.stop_reason == "max_tokens":
            note = "[Response truncated due to max_tokens limit]"
        else:
            note = "[Response truncated due to context window limit]"
        return f"{text}\n\n{note}"
    return text

pause_turnの再試行ロジックを実装する

サーバーツールを使用する場合、サーバー側のサンプリングループが反復回数の上限（デフォルト10）に達すると、APIはpause_turnを返すことがあります。会話を継続することでこれを処理してください:

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    Handle server tool conversations that may require multiple continuations.

    The server runs a sampling loop when executing server tools. If the loop
    reaches its iteration limit, the API returns pause_turn. Continue the
    conversation by sending the response back to let Claude finish.
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-5", max_tokens=4096, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claudeが処理を完了 - 最終レスポンスを返す
            return response

        # pause_turn: ロールの交互順を維持するためメッセージリスト全体を置き換える
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 最大継続回数に到達 - 最後のレスポンスを返す
    return response

停止理由とエラーの違い

stop_reasonの値と実際のエラーを区別することが重要です:

停止理由（成功したレスポンス）

レスポンスボディの一部
生成が正常に停止した理由を示す
レスポンスには有効なコンテンツが含まれる

エラー（失敗したリクエスト）

HTTPステータスコード4xxまたは5xx
リクエスト処理の失敗を示す
レスポンスにはエラーの詳細が含まれる

client = anthropic.Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # stop_reasonを含む成功レスポンスを処理
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 実際のエラーを処理
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

ストリーミングに関する考慮事項

ストリーミングを使用する場合、stop_reasonは次のようになります:

最初のmessage_startイベントではnull
message_deltaイベントで提供される
その他のイベントでは提供されない

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

一般的なパターン

ツール使用ワークフローの処理

ツールランナーでより簡単に: 以下の例は手動でのツール処理を示しています。ほとんどのユースケースでは、ツールランナーがはるかに少ないコードでツールの実行を自動的に処理します。

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-5", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # ツールを実行して続行
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 最終レスポンス
            return response

完全なレスポンスの確保

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-5", messages=messages, max_tokens=4096
        )

        full_response += next(
            (block.text for block in response.content if block.type == "text"), ""
        )

        if response.stop_reason != "max_tokens":
            break

        # 中断した箇所から続行する
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

入力サイズを知らずに最大トークンを取得する

model_context_window_exceeded停止理由を使用すると、入力サイズを計算せずに可能な限り最大のトークンをリクエストできます:

def get_max_possible_tokens(client, prompt):
    """
    Get as many tokens as possible within the model's context window
    without needing to calculate input token count
    """
    response = client.beta.messages.create(
        model="claude-opus-5",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 入力サイズに対して可能な最大トークン数を取得
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 要求したトークン数ちょうどを取得
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 自然な完了
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return next((block.text for block in response.content if block.type == "text"), "")

次のステップ

拒否とフォールバック

拒否されたリクエストを、サーバー側またはクライアント側でフォールバックモデルで再試行します。

ツールランナー（SDK）

SDKにtool_useループ、結果のフォーマット、再試行を管理させます。

メッセージのストリーミング

ストリーミング時にmessage_deltaイベントからstop_reasonを読み取ります。

エラー

停止理由とは異なる4xxおよび5xx HTTPエラーを処理します。

Was this page helpful?

メッセージClaudeで構築する

停止理由とフォールバック

各stop_reason値の意味と、アプリケーションでの切り捨て、ツール使用、一時停止されたターン、拒否の処理方法を学びます。

完全なレスポンススキーマについては、Messages APIリファレンスを参照してください。

クイックリファレンス

値	発生するタイミング	対処方法
`end_turn`	Claudeが自然にレスポンスを完了した。	レスポンスを使用します。
`max_tokens`	レスポンスが`max_tokens`の制限に達した。	`max_tokens`を増やすか、レスポンスを継続します。
`stop_sequence`	Claudeが`stop_sequences`のいずれかを出力した。	`stop_sequence`を読んで、どれが発動したかを確認します。
`tool_use`	Claudeがツールを呼び出している。	ツールを実行して結果を返します。結果ブロックがまだないサーバーツール呼び出しは、後のレスポンスで完了します。
`pause_turn`	サーバーツールのループが反復回数の上限に達した。	アシスタントのコンテンツを送り返して継続します。
`refusal`	Claudeが応答を拒否した。	`stop_details`を読み、フォールバックモデルで再試行します。
`model_context_window_exceeded`	レスポンスがモデルのコンテキストウィンドウを埋め尽くした。	レスポンスを切り捨てられたものとして扱います。

stop_reasonフィールド

Example response

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "stop_details": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

停止理由の値

end_turn

最も一般的な停止理由です。Claudeが自然にレスポンスを完了したことを示します。

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 完全なレスポンスを処理します
    for block in response.content:
        if block.type == "text":
            print(block.text)

max_tokens

リクエストで指定されたmax_tokensの制限に達したため、Claudeが停止しました。

client = anthropic.Anthropic()
# トークン数を制限したリクエスト
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # レスポンスが途中で打ち切られました
    print("Response was cut off at token limit")
    # 続きを取得するには再度リクエストすることを検討してください

stop_sequence

Claudeがカスタム停止シーケンスのいずれかに遭遇しました。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claudeがツールを呼び出しており、あなたがそれを実行することを期待しています。

ほとんどのツール使用の実装では、ツールの実行、結果のフォーマット、会話の管理を自動的に処理するツールランナーを使用してください。

client = anthropic.Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """Execute a tool and return the result."""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
)

if response.stop_reason == "tool_use":
    # ツールを抽出して実行する
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # 最終応答のために結果をClaudeに返す

A mixed tool_use response

{
  "stop_reason": "tool_use",
  "content": [
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01HxbWnMRmbWyMfUtJKC45rA",
      "name": "web_search",
      "input": { "query": "example article" }
    },
    {
      "type": "tool_use",
      "id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "name": "run_command",
      "input": { "command": "uname -a" }
    }
  ]
}

The follow-up user message

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01PjgRJLbXrXEMZwDNYLnBqk",
      "content": "Linux demo-host 6.8.0-52-generic x86_64 GNU/Linux"
    }
  ]
}

`web_search` tool use with id `srvtoolu_01HxbWnMRmbWyMfUtJKC45rA` was found without a corresponding `web_search_tool_result` block

pause_turn

response = client.messages.create(
    model="claude-opus-5",
    max_tokens=4096,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # レスポンスを送り返して会話を続けます
    messages = [
        {"role": "user", "content": "Search for latest AI news"},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-5",
        max_tokens=4096,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )

refusal

Claudeがレスポンスの生成を拒否しました。安全性分類器は、この停止理由をエラーではなく通常のHTTP 200レスポンスとして返します。

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claudeが応答を拒否しました
    print("Claude was unable to process this request")
    # リクエストの言い換えや修正を検討してください

model_context_window_exceeded

# できるだけ多く取得するために最大トークン数でリクエスト
response = client.beta.messages.create(
    model="claude-opus-5",
    max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # レスポンスがmax_tokensより先にコンテキストウィンドウの上限に到達
    print("Response reached model's context window limit")
    # レスポンスは有効ですが、コンテキストウィンドウにより制限されました

停止理由を処理するためのベストプラクティス

常にstop_reasonを確認する

レスポンス処理ロジックでstop_reasonを確認することを習慣にしてください:

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # end_turnやその他のケースを処理
        return next(
            (block.text for block in response.content if block.type == "text"), ""
        )

切り捨てられたレスポンスを適切に処理する

def handle_truncated_response(response):
    text = next((block.text for block in response.content if block.type == "text"), "")
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        if response.stop_reason == "max_tokens":
            note = "[Response truncated due to max_tokens limit]"
        else:
            note = "[Response truncated due to context window limit]"
        return f"{text}\n\n{note}"
    return text

pause_turnの再試行ロジックを実装する

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    Handle server tool conversations that may require multiple continuations.

    The server runs a sampling loop when executing server tools. If the loop
    reaches its iteration limit, the API returns pause_turn. Continue the
    conversation by sending the response back to let Claude finish.
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-5", max_tokens=4096, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claudeが処理を完了 - 最終レスポンスを返す
            return response

        # pause_turn: ロールの交互順を維持するためメッセージリスト全体を置き換える
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 最大継続回数に到達 - 最後のレスポンスを返す
    return response

停止理由とエラーの違い

stop_reasonの値と実際のエラーを区別することが重要です:

停止理由（成功したレスポンス）

レスポンスボディの一部
生成が正常に停止した理由を示す
レスポンスには有効なコンテンツが含まれる

エラー（失敗したリクエスト）

HTTPステータスコード4xxまたは5xx
リクエスト処理の失敗を示す
レスポンスにはエラーの詳細が含まれる

client = anthropic.Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # stop_reasonを含む成功レスポンスを処理
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 実際のエラーを処理
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

ストリーミングに関する考慮事項

ストリーミングを使用する場合、stop_reasonは次のようになります:

最初のmessage_startイベントではnull
message_deltaイベントで提供される
その他のイベントでは提供されない

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

一般的なパターン

ツール使用ワークフローの処理

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-5", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # ツールを実行して続行
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 最終レスポンス
            return response

完全なレスポンスの確保

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-5", messages=messages, max_tokens=4096
        )

        full_response += next(
            (block.text for block in response.content if block.type == "text"), ""
        )

        if response.stop_reason != "max_tokens":
            break

        # 中断した箇所から続行する
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

入力サイズを知らずに最大トークンを取得する

model_context_window_exceeded停止理由を使用すると、入力サイズを計算せずに可能な限り最大のトークンをリクエストできます:

def get_max_possible_tokens(client, prompt):
    """
    Get as many tokens as possible within the model's context window
    without needing to calculate input token count
    """
    response = client.beta.messages.create(
        model="claude-opus-5",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK requires streaming for max_tokens above ~21k
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 入力サイズに対して可能な最大トークン数を取得
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 要求したトークン数ちょうどを取得
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 自然な完了
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return next((block.text for block in response.content if block.type == "text"), "")

次のステップ

拒否とフォールバック

拒否されたリクエストを、サーバー側またはクライアント側でフォールバックモデルで再試行します。

ツールランナー（SDK）

SDKにtool_useループ、結果のフォーマット、再試行を管理させます。

メッセージのストリーミング

ストリーミング時にmessage_deltaイベントからstop_reasonを読み取ります。

エラー

停止理由とは異なる4xxおよび5xx HTTPエラーを処理します。

Was this page helpful?

クイックリファレンス

stop_reasonフィールド

停止理由の値

end_turn

end_turnでの空のレスポンス

max_tokens

不完全なツール使用ブロック

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

停止理由を処理するためのベストプラクティス

常にstop_reasonを確認する

切り捨てられたレスポンスを適切に処理する

pause_turnの再試行ロジックを実装する

停止理由とエラーの違い

停止理由（成功したレスポンス）

エラー（失敗したリクエスト）

ストリーミングに関する考慮事項

一般的なパターン

ツール使用ワークフローの処理

完全なレスポンスの確保

入力サイズを知らずに最大トークンを取得する

次のステップ

クイックリファレンス

stop_reasonフィールド

停止理由の値

end_turn

end_turnでの空のレスポンス

max_tokens

不完全なツール使用ブロック

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

停止理由を処理するためのベストプラクティス

常にstop_reasonを確認する

切り捨てられたレスポンスを適切に処理する

pause_turnの再試行ロジックを実装する

停止理由とエラーの違い

停止理由（成功したレスポンス）

エラー（失敗したリクエスト）

ストリーミングに関する考慮事項

一般的なパターン

ツール使用ワークフローの処理

完全なレスポンスの確保

入力サイズを知らずに最大トークンを取得する

次のステップ

クイックリファレンス

stop_reasonフィールド

停止理由の値

end_turn

max_tokens

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

停止理由を処理するためのベストプラクティス

常にstop_reasonを確認する

切り捨てられたレスポンスを適切に処理する

pause_turnの再試行ロジックを実装する

停止理由とエラーの違い

停止理由（成功したレスポンス）

エラー（失敗したリクエスト）

ストリーミングに関する考慮事項

一般的なパターン

ツール使用ワークフローの処理

完全なレスポンスの確保

入力サイズを知らずに最大トークンを取得する

次のステップ

クイックリファレンス

stop_reasonフィールド

停止理由の値

end_turn

max_tokens

stop_sequence

tool_use

pause_turn

refusal

model_context_window_exceeded

停止理由を処理するためのベストプラクティス

常にstop_reasonを確認する

切り捨てられたレスポンスを適切に処理する

pause_turnの再試行ロジックを実装する

停止理由とエラーの違い

停止理由（成功したレスポンス）

エラー（失敗したリクエスト）

ストリーミングに関する考慮事項

一般的なパターン

ツール使用ワークフローの処理

完全なレスポンスの確保

入力サイズを知らずに最大トークンを取得する

次のステップ