上下文管理

压缩

用于管理接近上下文窗口限制的长对话的服务端上下文压缩。

服务端压缩是管理长时间运行对话和智能体工作流中上下文的推荐策略。它以最少的集成工作自动处理上下文管理。

压缩通过在接近上下文窗口限制时自动总结较早的上下文，来扩展长时间运行对话和任务的有效上下文长度。这非常适用于：

基于聊天的多轮对话，您希望用户在一个聊天中长时间使用
面向任务的提示，需要大量后续工作（通常是工具使用），可能超过 200K 上下文窗口

压缩目前处于 beta 阶段。在您的 API 请求中包含 beta 头 compact-2026-01-12 以使用此功能。

支持的模型

压缩在以下模型上受支持：

Claude Opus 4.6 (claude-opus-4-6)

压缩的工作原理

启用压缩后，当对话接近配置的 token 阈值时，Claude 会自动总结您的对话。API：

检测输入 token 何时超过您指定的触发阈值。
生成当前对话的摘要。
创建包含摘要的 compaction 块。
使用压缩后的上下文继续响应。

在后续请求中，将响应追加到您的消息中。API 会自动丢弃 compaction 块之前的所有消息块，从摘要继续对话。

基本用法

通过在 Messages API 请求中将 compact_20260112 策略添加到 context_management.edits 来启用压缩。

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "anthropic-beta: compact-2026-01-12" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "Help me build a website"
        }
    ],
    "context_management": {
        "edits": [
            {
                "type": "compact_20260112"
            }
        ]
    }
}'

参数

参数	类型	默认值	描述
`type`	string	必填	必须为 `"compact_20260112"`
`trigger`	object	150,000 tokens	何时触发压缩。必须至少为 50,000 tokens。
`pause_after_compaction`	boolean	`false`	是否在生成压缩摘要后暂停
`instructions`	string	`null`	自定义总结提示。提供时完全替换默认提示。

触发配置

使用 trigger 参数配置压缩何时触发：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {
                    "type": "input_tokens",
                    "value": 150000
                }
            }
        ]
    }
)

自定义总结指令

默认情况下，压缩使用以下总结提示：

You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.

您可以通过 instructions 参数提供自定义指令来完全替换此提示。自定义指令不是补充默认提示；它们完全替换默认提示：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
            }
        ]
    }
)

压缩后暂停

使用 pause_after_compaction 在生成压缩摘要后暂停 API。这允许您在 API 继续响应之前添加额外的内容块（例如保留最近的消息或特定的指令导向消息）。

启用后，API 在生成压缩块后返回带有 compaction 停止原因的消息：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "pause_after_compaction": True
            }
        ]
    }
)

# Check if compaction triggered a pause
if response.stop_reason == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response.content})

    # Continue the request
    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [{"type": "compact_20260112"}]
        }
    )

强制执行总 token 预算

当模型处理具有多次工具使用迭代的长任务时，总 token 消耗可能会显著增长。您可以将 pause_after_compaction 与压缩计数器结合使用，以估算累计使用量，并在达到预算时优雅地结束任务：

Python

TRIGGER_THRESHOLD = 100_000
TOTAL_TOKEN_BUDGET = 3_000_000
n_compactions = 0

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {"type": "input_tokens", "value": TRIGGER_THRESHOLD},
                "pause_after_compaction": True,
            }
        ]
    },
)

if response.stop_reason == "compaction":
    n_compactions += 1
    messages.append({"role": "assistant", "content": response.content})

    # Estimate total tokens consumed; prompt wrap-up if over budget
    if n_compactions * TRIGGER_THRESHOLD >= TOTAL_TOKEN_BUDGET:
        messages.append({
            "role": "user",
            "content": "Please wrap up your current work and summarize the final state.",
        })

使用压缩块

当触发压缩时，API 在助手响应的开头返回一个 compaction 块。

长时间运行的对话可能会导致多次压缩。最后一个压缩块反映了提示的最终状态，用生成的摘要替换其之前的内容。

{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}

将压缩块传回

您必须在后续请求中将 compaction 块传回 API，以使用缩短的提示继续对话。最简单的方法是将整个响应内容追加到您的消息中：

# After receiving a response with a compaction block
messages.append({"role": "assistant", "content": response.content})

# Continue the conversation
messages.append({"role": "user", "content": "Now add error handling"})

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

当 API 接收到 compaction 块时，其之前的所有内容块都会被忽略。您可以：

将原始消息保留在列表中，让 API 处理删除已压缩的内容
手动丢弃已压缩的消息，仅包含压缩块及其之后的内容

流式传输

在启用压缩的情况下流式传输响应时，当压缩开始时您将收到一个 content_block_start 事件。压缩块的流式传输方式与文本块不同。您将收到一个 content_block_start 事件，然后是一个包含完整摘要内容的单个 content_block_delta（没有中间流式传输），最后是一个 content_block_stop 事件。

import anthropic

client = anthropic.Anthropic()

with client.beta.messages.stream(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "compaction":
                print("Compaction started...")
            elif event.content_block.type == "text":
                print("Text response started...")

        elif event.type == "content_block_delta":
            if event.delta.type == "compaction_delta":
                print(f"Compaction complete: {len(event.delta.content)} chars")
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

    # Get the final accumulated message
    message = stream.get_final_message()
    messages.append({"role": "assistant", "content": message.content})

提示缓存

您可以在压缩块上添加 cache_control 断点，这会缓存完整的系统提示以及总结的内容。原始的已压缩内容会被忽略。

{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}

了解用量

压缩需要额外的采样步骤，这会计入速率限制和计费。API 在响应中返回详细的用量信息：

{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}

iterations 数组显示每个采样迭代的用量。当发生压缩时，您将看到一个 compaction 迭代，后跟主要的 message 迭代。最后一个迭代的 token 计数反映了压缩后的有效上下文大小。

顶层的 input_tokens 和 output_tokens 不包括压缩迭代的用量——它们反映的是所有非压缩迭代的总和。要计算请求消耗和计费的总 token 数，请对 usage.iterations 数组中的所有条目求和。

如果您之前依赖 usage.input_tokens 和 usage.output_tokens 进行成本跟踪或审计，当启用压缩时，您需要更新跟踪逻辑以跨 usage.iterations 进行聚合。iterations 数组仅在请求期间触发新压缩时才会填充。重新应用之前的 compaction 块不会产生额外的压缩成本，在这种情况下顶层用量字段仍然准确。

与其他功能结合使用

服务端工具

使用服务端工具（如网络搜索）时，压缩触发器在每个采样迭代开始时检查。根据您的触发阈值和生成的输出量，压缩可能在单个请求中多次发生。

Token 计数

Token 计数端点（/v1/messages/count_tokens）会应用提示中现有的 compaction 块，但不会触发新的压缩。使用它来检查之前压缩后的有效 token 计数：

count_response = client.beta.messages.count_tokens(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

print(f"Current tokens: {count_response.input_tokens}")
print(f"Original tokens: {count_response.context_management.original_input_tokens}")

示例

以下是使用压缩进行长时间运行对话的完整示例：

import anthropic

client = anthropic.Anthropic()

messages: list[dict] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000}
                }
            ]
        }
    )

    # Append response (compaction blocks are automatically included)
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

以下是使用 pause_after_compaction 保留最后两条消息（一个用户 + 一个助手轮次）原文而不进行总结的示例：

import anthropic
from typing import Any

client = anthropic.Anthropic()

messages: list[dict[str, Any]] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000},
                    "pause_after_compaction": True
                }
            ]
        }
    )

    # Check if compaction occurred and paused
    if response.stop_reason == "compaction":
        # Get the compaction block from the response
        compaction_block = response.content[0]

        # Preserve the last 2 messages (1 user + 1 assistant turn)
        # by including them after the compaction block
        preserved_messages = messages[-2:] if len(messages) >= 2 else messages

        # Build new message list: compaction + preserved messages
        new_assistant_content = [compaction_block]
        messages_after_compaction = [
            {"role": "assistant", "content": new_assistant_content}
        ] + preserved_messages

        # Continue the request with the compacted context + preserved messages
        response = client.beta.messages.create(
            betas=["compact-2026-01-12"],
            model="claude-opus-4-6",
            max_tokens=4096,
            messages=messages_after_compaction,
            context_management={
                "edits": [{"type": "compact_20260112"}]
            }
        )

        # Update our message list to reflect the compaction
        messages.clear()
        messages.extend(messages_after_compaction)

    # Append the final response
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

当前限制

使用相同模型进行总结： 您请求中指定的模型用于总结。没有选项可以使用不同的（例如更便宜的）模型来生成摘要。

后续步骤

压缩 cookbook

在 cookbook 中探索实际示例和实现。

上下文窗口

了解上下文窗口大小和管理策略。

上下文编辑

探索管理对话上下文的其他策略，如工具结果清除和思考块清除。

Was this page helpful?

上下文管理

压缩

用于管理接近上下文窗口限制的长对话的服务端上下文压缩。

服务端压缩是管理长时间运行对话和智能体工作流中上下文的推荐策略。它以最少的集成工作自动处理上下文管理。

压缩通过在接近上下文窗口限制时自动总结较早的上下文，来扩展长时间运行对话和任务的有效上下文长度。这非常适用于：

基于聊天的多轮对话，您希望用户在一个聊天中长时间使用
面向任务的提示，需要大量后续工作（通常是工具使用），可能超过 200K 上下文窗口

压缩目前处于 beta 阶段。在您的 API 请求中包含 beta 头 compact-2026-01-12 以使用此功能。

支持的模型

压缩在以下模型上受支持：

Claude Opus 4.6 (claude-opus-4-6)

压缩的工作原理

启用压缩后，当对话接近配置的 token 阈值时，Claude 会自动总结您的对话。API：

检测输入 token 何时超过您指定的触发阈值。
生成当前对话的摘要。
创建包含摘要的 compaction 块。
使用压缩后的上下文继续响应。

在后续请求中，将响应追加到您的消息中。API 会自动丢弃 compaction 块之前的所有消息块，从摘要继续对话。

基本用法

通过在 Messages API 请求中将 compact_20260112 策略添加到 context_management.edits 来启用压缩。

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "anthropic-beta: compact-2026-01-12" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-6",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "Help me build a website"
        }
    ],
    "context_management": {
        "edits": [
            {
                "type": "compact_20260112"
            }
        ]
    }
}'

参数

参数	类型	默认值	描述
`type`	string	必填	必须为 `"compact_20260112"`
`trigger`	object	150,000 tokens	何时触发压缩。必须至少为 50,000 tokens。
`pause_after_compaction`	boolean	`false`	是否在生成压缩摘要后暂停
`instructions`	string	`null`	自定义总结提示。提供时完全替换默认提示。

触发配置

使用 trigger 参数配置压缩何时触发：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {
                    "type": "input_tokens",
                    "value": 150000
                }
            }
        ]
    }
)

自定义总结指令

默认情况下，压缩使用以下总结提示：

You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.

您可以通过 instructions 参数提供自定义指令来完全替换此提示。自定义指令不是补充默认提示；它们完全替换默认提示：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
            }
        ]
    }
)

压缩后暂停

使用 pause_after_compaction 在生成压缩摘要后暂停 API。这允许您在 API 继续响应之前添加额外的内容块（例如保留最近的消息或特定的指令导向消息）。

启用后，API 在生成压缩块后返回带有 compaction 停止原因的消息：

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "pause_after_compaction": True
            }
        ]
    }
)

# Check if compaction triggered a pause
if response.stop_reason == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response.content})

    # Continue the request
    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [{"type": "compact_20260112"}]
        }
    )

强制执行总 token 预算

Python

TRIGGER_THRESHOLD = 100_000
TOTAL_TOKEN_BUDGET = 3_000_000
n_compactions = 0

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [
            {
                "type": "compact_20260112",
                "trigger": {"type": "input_tokens", "value": TRIGGER_THRESHOLD},
                "pause_after_compaction": True,
            }
        ]
    },
)

if response.stop_reason == "compaction":
    n_compactions += 1
    messages.append({"role": "assistant", "content": response.content})

    # Estimate total tokens consumed; prompt wrap-up if over budget
    if n_compactions * TRIGGER_THRESHOLD >= TOTAL_TOKEN_BUDGET:
        messages.append({
            "role": "user",
            "content": "Please wrap up your current work and summarize the final state.",
        })

使用压缩块

当触发压缩时，API 在助手响应的开头返回一个 compaction 块。

长时间运行的对话可能会导致多次压缩。最后一个压缩块反映了提示的最终状态，用生成的摘要替换其之前的内容。

{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}

将压缩块传回

您必须在后续请求中将 compaction 块传回 API，以使用缩短的提示继续对话。最简单的方法是将整个响应内容追加到您的消息中：

# After receiving a response with a compaction block
messages.append({"role": "assistant", "content": response.content})

# Continue the conversation
messages.append({"role": "user", "content": "Now add error handling"})

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

当 API 接收到 compaction 块时，其之前的所有内容块都会被忽略。您可以：

将原始消息保留在列表中，让 API 处理删除已压缩的内容
手动丢弃已压缩的消息，仅包含压缩块及其之后的内容

流式传输

import anthropic

client = anthropic.Anthropic()

with client.beta.messages.stream(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "compaction":
                print("Compaction started...")
            elif event.content_block.type == "text":
                print("Text response started...")

        elif event.type == "content_block_delta":
            if event.delta.type == "compaction_delta":
                print(f"Compaction complete: {len(event.delta.content)} chars")
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

    # Get the final accumulated message
    message = stream.get_final_message()
    messages.append({"role": "assistant", "content": message.content})

提示缓存

您可以在压缩块上添加 cache_control 断点，这会缓存完整的系统提示以及总结的内容。原始的已压缩内容会被忽略。

{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}

了解用量

压缩需要额外的采样步骤，这会计入速率限制和计费。API 在响应中返回详细的用量信息：

{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}

与其他功能结合使用

服务端工具

使用服务端工具（如网络搜索）时，压缩触发器在每个采样迭代开始时检查。根据您的触发阈值和生成的输出量，压缩可能在单个请求中多次发生。

Token 计数

Token 计数端点（/v1/messages/count_tokens）会应用提示中现有的 compaction 块，但不会触发新的压缩。使用它来检查之前压缩后的有效 token 计数：

count_response = client.beta.messages.count_tokens(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    messages=messages,
    context_management={
        "edits": [{"type": "compact_20260112"}]
    }
)

print(f"Current tokens: {count_response.input_tokens}")
print(f"Original tokens: {count_response.context_management.original_input_tokens}")

示例

以下是使用压缩进行长时间运行对话的完整示例：

import anthropic

client = anthropic.Anthropic()

messages: list[dict] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000}
                }
            ]
        }
    )

    # Append response (compaction blocks are automatically included)
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

以下是使用 pause_after_compaction 保留最后两条消息（一个用户 + 一个助手轮次）原文而不进行总结的示例：

import anthropic
from typing import Any

client = anthropic.Anthropic()

messages: list[dict[str, Any]] = []

def chat(user_message: str) -> str:
    messages.append({"role": "user", "content": user_message})

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-6",
        max_tokens=4096,
        messages=messages,
        context_management={
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {"type": "input_tokens", "value": 100000},
                    "pause_after_compaction": True
                }
            ]
        }
    )

    # Check if compaction occurred and paused
    if response.stop_reason == "compaction":
        # Get the compaction block from the response
        compaction_block = response.content[0]

        # Preserve the last 2 messages (1 user + 1 assistant turn)
        # by including them after the compaction block
        preserved_messages = messages[-2:] if len(messages) >= 2 else messages

        # Build new message list: compaction + preserved messages
        new_assistant_content = [compaction_block]
        messages_after_compaction = [
            {"role": "assistant", "content": new_assistant_content}
        ] + preserved_messages

        # Continue the request with the compacted context + preserved messages
        response = client.beta.messages.create(
            betas=["compact-2026-01-12"],
            model="claude-opus-4-6",
            max_tokens=4096,
            messages=messages_after_compaction,
            context_management={
                "edits": [{"type": "compact_20260112"}]
            }
        )

        # Update our message list to reflect the compaction
        messages.clear()
        messages.extend(messages_after_compaction)

    # Append the final response
    messages.append({"role": "assistant", "content": response.content})

    # Return the text content
    return next(
        block.text for block in response.content if block.type == "text"
    )

# Run a long conversation
print(chat("Help me build a Python web scraper"))
print(chat("Add support for JavaScript-rendered pages"))
print(chat("Now add rate limiting and error handling"))
# ... continue as long as needed

当前限制

使用相同模型进行总结： 您请求中指定的模型用于总结。没有选项可以使用不同的（例如更便宜的）模型来生成摘要。

后续步骤

压缩 cookbook

在 cookbook 中探索实际示例和实现。

上下文窗口

了解上下文窗口大小和管理策略。

上下文编辑

探索管理对话上下文的其他策略，如工具结果清除和思考块清除。

Was this page helpful?