コンテキスト管理

コンパクション

コンテキストウィンドウの制限に近づく長い会話を管理するためのサーバーサイドコンテキストコンパクション。

サーバーサイドコンパクションは、長時間実行される会話やエージェントワークフローにおけるコンテキスト管理の推奨戦略です。最小限の統合作業でコンテキスト管理を自動的に処理します。

コンパクションは、コンテキストウィンドウの制限に近づいたときに古いコンテキストを自動的に要約することで、長時間実行される会話やタスクの実効コンテキスト長を拡張します。これは以下の場合に最適です：

ユーザーが長期間にわたって1つのチャットを使用するチャットベースのマルチターン会話
200Kコンテキストウィンドウを超える可能性のある多くのフォローアップ作業（多くの場合ツール使用）を必要とするタスク指向のプロンプト

コンパクションは現在ベータ版です。この機能を使用するには、APIリクエストにベータヘッダー compact-2026-01-12 を含めてください。

サポートされているモデル

コンパクションは以下のモデルでサポートされています：

Claude Opus 4.6 (claude-opus-4-6)

コンパクションの仕組み

コンパクションが有効な場合、Claudeは設定されたトークン閾値に近づくと自動的に会話を要約します。APIは：

入力トークンが指定されたトリガー閾値を超えたことを検出します。
現在の会話の要約を生成します。
要約を含む compaction ブロックを作成します。
コンパクションされたコンテキストでレスポンスを続行します。

後続のリクエストでは、レスポンスをメッセージに追加します。APIは compaction ブロックより前のすべてのメッセージブロックを自動的に削除し、要約から会話を続行します。

基本的な使い方

Messages APIリクエストの context_management.edits に compact_20260112 戦略を追加してコンパクションを有効にします。

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "anthropic-beta: compact-2026-01-12" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "Help me build a website"
        }
    ],
    "context_management": {
        "edits": [
            {
                "type": "compact_20260112"
            }
        ]
    }
}'

パラメータ

パラメータ	型	デフォルト	説明
`type`	string	必須	`"compact_20260112"` でなければなりません
`trigger`	object	150,000トークン	コンパクションをトリガーするタイミング。最低50,000トークン以上である必要があります。
`pause_after_compaction`	boolean	`false`	コンパクション要約の生成後に一時停止するかどうか
`instructions`	string	`null`	カスタム要約プロンプト。指定された場合、デフォルトのプロンプトを完全に置き換えます。

トリガー設定

trigger パラメータを使用してコンパクションがトリガーされるタイミングを設定します：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {
                    "type": "input_tokens",
                    "value": 150000
                }
            }
        ]
    }
)

カスタム要約指示

デフォルトでは、コンパクションは以下の要約プロンプトを使用します：

You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.

instructions パラメータを使用してカスタム指示を提供し、このプロンプトを完全に置き換えることができます。カスタム指示はデフォルトを補完するものではなく、完全に置き換えます：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
            }
        ]
    }
)

コンパクション後の一時停止

pause_after_compaction を使用して、コンパクション要約の生成後にAPIを一時停止します。これにより、APIがレスポンスを続行する前に、追加のコンテンツブロック（最近のメッセージの保持や特定の指示指向のメッセージなど）を追加できます。

有効にすると、APIはコンパクションブロックの生成後に compaction 停止理由を持つメッセージを返します：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "pause_after_compaction": True
            }
        ]
    }
)

# Check if compaction triggered a pause
if response.stop_reason == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response.content})

    # Continue the request
    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [{"type": "compact_20260112"}]
        }
    )

合計トークン予算の適用

モデルが多くのツール使用イテレーションを伴う長いタスクに取り組む場合、合計トークン消費量は大幅に増加する可能性があります。pause_after_compaction とコンパクションカウンターを組み合わせて、累積使用量を推定し、予算に達したらタスクを適切に終了させることができます：

Python

TRIGGER_THRESHOLD = 100_000
TOTAL_TOKEN_BUDGET = 3_000_000
n_compactions = 0

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {"type": "input_tokens", "value": TRIGGER_THRESHOLD},
                "pause_after_compaction": True,
            }
        ]
    },
)

if response.stop_reason == "compaction":
    n_compactions += 1
    messages.append({"role": "assistant", "content": response.content})

    # Estimate total tokens consumed; prompt wrap-up if over budget
    if n_compactions * TRIGGER_THRESHOLD >= TOTAL_TOKEN_BUDGET:
        messages.append({
            "role": "user",
            "content": "Please wrap up your current work and summarize the final state.",
        })

コンパクションブロックの操作

コンパクションがトリガーされると、APIはアシスタントレスポンスの先頭に compaction ブロックを返します。

長時間実行される会話では、複数回のコンパクションが発生する場合があります。最後のコンパクションブロックはプロンプトの最終状態を反映し、それより前のコンテンツを生成された要約で置き換えます。

{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}

コンパクションブロックの返送

短縮されたプロンプトで会話を続行するには、後続のリクエストで compaction ブロックをAPIに返送する必要があります。最も簡単な方法は、レスポンスコンテンツ全体をメッセージに追加することです：

# After receiving a response with a compaction block
messages.append({"role": "assistant", "content": response.content})

# Continue the conversation
messages.append({"role": "user", "content": "Now add error handling"})

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

APIが compaction ブロックを受信すると、それより前のすべてのコンテンツブロックは無視されます。以下のいずれかの方法を選択できます：

元のメッセージをリストに保持し、APIにコンパクションされたコンテンツの削除を任せる
コンパクションされたメッセージを手動で削除し、コンパクションブロック以降のみを含める

ストリーミング

コンパクションが有効な状態でレスポンスをストリーミングする場合、コンパクションが開始されると content_block_start イベントを受信します。コンパクションブロックはテキストブロックとは異なる方法でストリーミングされます。content_block_start イベントを受信した後、完全な要約コンテンツを含む単一の content_block_delta（中間のストリーミングなし）を受信し、その後 content_block_stop イベントを受信します。

import anthropic

client = anthropic.Anthropic()

with client.beta.messages.stream(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "compaction":
                print("Compaction started...")
            elif event.content_block.type == "text":
                print("Text response started...")

        elif event.type == "content_block_delta":
            if event.delta.type == "compaction_delta":
                print(f"Compaction complete: {len(event.delta.content)} chars")
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

    # Get the final accumulated message
    message = stream.get_final_message()
    messages.append({"role": "assistant", "content": message.content})

プロンプトキャッシング

コンパクションブロックに cache_control ブレークポイントを追加できます。これにより、完全なシステムプロンプトと要約されたコンテンツがキャッシュされます。元のコンパクションされたコンテンツは無視されます。

{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}

使用量の理解

コンパクションには追加のサンプリングステップが必要であり、これはレート制限と課金に影響します。APIはレスポンスで詳細な使用量情報を返します：

{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}

iterations 配列は、各サンプリングイテレーションの使用量を示します。コンパクションが発生すると、compaction イテレーションの後にメインの message イテレーションが表示されます。最後のイテレーションのトークン数は、コンパクション後の実効コンテキストサイズを反映します。

トップレベルの input_tokens と output_tokens にはコンパクションイテレーションの使用量は含まれません。これらはすべての非コンパクションイテレーションの合計を反映します。リクエストで消費および課金される合計トークンを計算するには、usage.iterations 配列のすべてのエントリを合計してください。

以前にコスト追跡や監査のために usage.input_tokens と usage.output_tokens に依存していた場合、コンパクションが有効な場合は usage.iterations 全体を集計するように追跡ロジックを更新する必要があります。iterations 配列は、リクエスト中に新しいコンパクションがトリガーされた場合にのみ入力されます。以前の compaction ブロックを再適用しても追加のコンパクションコストは発生せず、その場合はトップレベルの使用量フィールドは正確なままです。

他の機能との組み合わせ

サーバーツール

サーバーツール（Web検索など）を使用する場合、コンパクショントリガーは各サンプリングイテレーションの開始時にチェックされます。トリガー閾値と生成される出力量に応じて、単一のリクエスト内でコンパクションが複数回発生する場合があります。

トークンカウント

トークンカウントエンドポイント（/v1/messages/count_tokens）は、プロンプト内の既存の compaction ブロックを適用しますが、新しいコンパクションはトリガーしません。以前のコンパクション後の実効トークン数を確認するために使用してください：

count_response = client.beta.messages.count_tokens(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

print(f"Current tokens: {count_response.input_tokens}")
print(f"Original tokens: {count_response.context_management.original_input_tokens}")

例

コンパクションを使用した長時間実行会話の完全な例を以下に示します：

import anthropic

client = anthropic.Anthropic()

messages: list[dict] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000}
                }
            ]
        }
    )

    # Append response (compaction blocks are automatically included)
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

以下は、pause_after_compaction を使用して、最後の2つのメッセージ（1つのユーザー + 1つのアシスタントターン）を要約せずにそのまま保持する例です：

import anthropic
from typing import Any

client = anthropic.Anthropic()

messages: list[dict[str, Any]] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000},
                    "pause_after_compaction": True
                }
            ]
        }
    )

    # Check if compaction occurred and paused
    if response.stop_reason == "compaction":
        # Get the compaction block from the response
        compaction_block = response.content[0]

        # Preserve the last 2 messages (1 user + 1 assistant turn)
        # by including them after the compaction block
        preserved_messages = messages[-2:] if len(messages) >= 2 else messages

        # Build new message list: compaction + preserved messages
        new_assistant_content = [compaction_block]
        messages_after_compaction = [
            {"role": "assistant", "content": new_assistant_content}
        ] + preserved_messages

        # Continue the request with the compacted context + preserved messages
        response = client.beta.messages.create(
            betas=["compact-2026-01-12"],
            model="claude-opus-4-6",
            max_tokens=4096,
            messages=messages_after_compaction,
            context_management={
                "edits": [{"type": "compact_20260112"}]
            }
        )

        # Update our message list to reflect the compaction
        messages.clear()
        messages.extend(messages_after_compaction)

    # Append the final response
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

現在の制限事項

要約に同じモデルを使用： リクエストで指定されたモデルが要約に使用されます。要約に別の（例えば、より安価な）モデルを使用するオプションはありません。

次のステップ

コンパクションクックブック

クックブックで実践的な例と実装を探索してください。

コンテキストウィンドウ

コンテキストウィンドウのサイズと管理戦略について学びましょう。

コンテキスト編集

ツール結果のクリアや思考ブロックのクリアなど、会話コンテキストを管理するための他の戦略を探索してください。

Was this page helpful?

コンテキスト管理

コンパクション

コンテキストウィンドウの制限に近づく長い会話を管理するためのサーバーサイドコンテキストコンパクション。

ユーザーが長期間にわたって1つのチャットを使用するチャットベースのマルチターン会話
200Kコンテキストウィンドウを超える可能性のある多くのフォローアップ作業（多くの場合ツール使用）を必要とするタスク指向のプロンプト

コンパクションは現在ベータ版です。この機能を使用するには、APIリクエストにベータヘッダー compact-2026-01-12 を含めてください。

サポートされているモデル

コンパクションは以下のモデルでサポートされています：

Claude Opus 4.6 (claude-opus-4-6)

コンパクションの仕組み

コンパクションが有効な場合、Claudeは設定されたトークン閾値に近づくと自動的に会話を要約します。APIは：

入力トークンが指定されたトリガー閾値を超えたことを検出します。
現在の会話の要約を生成します。
要約を含む compaction ブロックを作成します。
コンパクションされたコンテキストでレスポンスを続行します。

基本的な使い方

Messages APIリクエストの context_management.edits に compact_20260112 戦略を追加してコンパクションを有効にします。

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "anthropic-beta: compact-2026-01-12" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "Help me build a website"
        }
    ],
    "context_management": {
        "edits": [
            {
                "type": "compact_20260112"
            }
        ]
    }
}'

パラメータ

パラメータ	型	デフォルト	説明
`type`	string	必須	`"compact_20260112"` でなければなりません
`trigger`	object	150,000トークン	コンパクションをトリガーするタイミング。最低50,000トークン以上である必要があります。
`pause_after_compaction`	boolean	`false`	コンパクション要約の生成後に一時停止するかどうか
`instructions`	string	`null`	カスタム要約プロンプト。指定された場合、デフォルトのプロンプトを完全に置き換えます。

トリガー設定

trigger パラメータを使用してコンパクションがトリガーされるタイミングを設定します：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {
                    "type": "input_tokens",
                    "value": 150000
                }
            }
        ]
    }
)

カスタム要約指示

デフォルトでは、コンパクションは以下の要約プロンプトを使用します：

You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
            }
        ]
    }
)

コンパクション後の一時停止

有効にすると、APIはコンパクションブロックの生成後に compaction 停止理由を持つメッセージを返します：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "pause_after_compaction": True
            }
        ]
    }
)

# Check if compaction triggered a pause
if response.stop_reason == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response.content})

    # Continue the request
    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [{"type": "compact_20260112"}]
        }
    )

合計トークン予算の適用

Python

TRIGGER_THRESHOLD = 100_000
TOTAL_TOKEN_BUDGET = 3_000_000
n_compactions = 0

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {"type": "input_tokens", "value": TRIGGER_THRESHOLD},
                "pause_after_compaction": True,
            }
        ]
    },
)

if response.stop_reason == "compaction":
    n_compactions += 1
    messages.append({"role": "assistant", "content": response.content})

    # Estimate total tokens consumed; prompt wrap-up if over budget
    if n_compactions * TRIGGER_THRESHOLD >= TOTAL_TOKEN_BUDGET:
        messages.append({
            "role": "user",
            "content": "Please wrap up your current work and summarize the final state.",
        })

コンパクションブロックの操作

コンパクションがトリガーされると、APIはアシスタントレスポンスの先頭に compaction ブロックを返します。

{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}

コンパクションブロックの返送

# After receiving a response with a compaction block
messages.append({"role": "assistant", "content": response.content})

# Continue the conversation
messages.append({"role": "user", "content": "Now add error handling"})

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

APIが compaction ブロックを受信すると、それより前のすべてのコンテンツブロックは無視されます。以下のいずれかの方法を選択できます：

元のメッセージをリストに保持し、APIにコンパクションされたコンテンツの削除を任せる
コンパクションされたメッセージを手動で削除し、コンパクションブロック以降のみを含める

ストリーミング

import anthropic

client = anthropic.Anthropic()

with client.beta.messages.stream(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "compaction":
                print("Compaction started...")
            elif event.content_block.type == "text":
                print("Text response started...")

        elif event.type == "content_block_delta":
            if event.delta.type == "compaction_delta":
                print(f"Compaction complete: {len(event.delta.content)} chars")
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

    # Get the final accumulated message
    message = stream.get_final_message()
    messages.append({"role": "assistant", "content": message.content})

プロンプトキャッシング

{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}

使用量の理解

{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}

他の機能との組み合わせ

サーバーツール

トークンカウント

count_response = client.beta.messages.count_tokens(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

print(f"Current tokens: {count_response.input_tokens}")
print(f"Original tokens: {count_response.context_management.original_input_tokens}")

例

コンパクションを使用した長時間実行会話の完全な例を以下に示します：

import anthropic

client = anthropic.Anthropic()

messages: list[dict] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000}
                }
            ]
        }
    )

    # Append response (compaction blocks are automatically included)
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

import anthropic
from typing import Any

client = anthropic.Anthropic()

messages: list[dict[str, Any]] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000},
                    "pause_after_compaction": True
                }
            ]
        }
    )

    # Check if compaction occurred and paused
    if response.stop_reason == "compaction":
        # Get the compaction block from the response
        compaction_block = response.content[0]

        # Preserve the last 2 messages (1 user + 1 assistant turn)
        # by including them after the compaction block
        preserved_messages = messages[-2:] if len(messages) >= 2 else messages

        # Build new message list: compaction + preserved messages
        new_assistant_content = [compaction_block]
        messages_after_compaction = [
            {"role": "assistant", "content": new_assistant_content}
        ] + preserved_messages

        # Continue the request with the compacted context + preserved messages
        response = client.beta.messages.create(
            betas=["compact-2026-01-12"],
            model="claude-opus-4-6",
            max_tokens=4096,
            messages=messages_after_compaction,
            context_management={
                "edits": [{"type": "compact_20260112"}]
            }
        )

        # Update our message list to reflect the compaction
        messages.clear()
        messages.extend(messages_after_compaction)

    # Append the final response
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

現在の制限事項

要約に同じモデルを使用： リクエストで指定されたモデルが要約に使用されます。要約に別の（例えば、より安価な）モデルを使用するオプションはありません。

次のステップ

コンパクションクックブック

クックブックで実践的な例と実装を探索してください。

コンテキストウィンドウ

コンテキストウィンドウのサイズと管理戦略について学びましょう。

コンテキスト編集

ツール結果のクリアや思考ブロックのクリアなど、会話コンテキストを管理するための他の戦略を探索してください。

Was this page helpful?