電腦使用工具

Claude 可以透過電腦使用工具與電腦環境互動，該工具提供螢幕截圖功能以及滑鼠/鍵盤控制，實現自主桌面互動。

電腦使用目前處於測試版，需要 beta 標頭：

"computer-use-2025-11-24" 適用於 Claude Opus 4.6、Claude Opus 4.5
"computer-use-2025-01-24" 適用於 Claude Sonnet 4.5、Haiku 4.5、Opus 4.1、Sonnet 4、Opus 4 和 Sonnet 3.7（已棄用）

請透過我們的意見回饋表單分享您對此功能的意見。

概述

電腦使用是一項測試版功能，使 Claude 能夠與桌面環境互動。此工具提供：

螢幕截圖擷取：查看螢幕上目前顯示的內容
滑鼠控制：點擊、拖曳和移動游標
鍵盤輸入：輸入文字和使用鍵盤快捷鍵
桌面自動化：與任何應用程式或介面互動

雖然電腦使用可以與其他工具（如 bash 和文字編輯器）搭配使用以實現更全面的自動化工作流程，但電腦使用特指電腦使用工具查看和控制桌面環境的能力。

模型相容性

電腦使用適用於以下 Claude 模型：

模型	工具版本	Beta 標記
Claude Opus 4.6、Claude Opus 4.5	`computer_20251124`	`computer-use-2025-11-24`
所有其他支援的模型	`computer_20250124`	`computer-use-2025-01-24`

Claude Opus 4.6 和 Claude Opus 4.5 引入了 computer_20251124 工具版本，具備新功能，包括用於詳細檢視螢幕區域的 zoom 動作。所有其他模型（Sonnet 4.5、Haiku 4.5、Sonnet 4、Opus 4、Opus 4.1 和 Sonnet 3.7）使用 computer_20250124 工具版本。

較舊的工具版本不保證與較新的模型向後相容。請始終使用與您的模型版本對應的工具版本。

安全注意事項

電腦使用是一項測試版功能，具有與標準 API 功能不同的獨特風險。在與網際網路互動時，這些風險會加劇。

為了將風險降至最低，請考慮採取以下預防措施：

使用具有最低權限的專用虛擬機器或容器，以防止直接的系統攻擊或意外事故。
避免讓模型存取敏感資料（如帳戶登入資訊），以防止資訊竊取。
將網際網路存取限制在允許的網域清單中，以減少接觸惡意內容的風險。
要求人類確認可能導致重大現實後果的決定，以及任何需要明確同意的任務，例如接受 Cookie、執行金融交易或同意服務條款。

在某些情況下，即使與使用者的指示衝突，Claude 也會遵循內容中的命令。例如，網頁上的 Claude 指示或圖片中包含的指示可能會覆蓋指示或導致 Claude 犯錯。我們建議採取預防措施，將 Claude 與敏感資料和操作隔離，以避免與提示注入相關的風險。

我們已訓練模型抵抗這些提示注入，並增加了額外的防禦層。如果您使用我們的電腦使用工具，我們將自動對您的提示執行分類器，以標記潛在的提示注入實例。當這些分類器在螢幕截圖中識別出潛在的提示注入時，它們將自動引導模型在繼續下一個動作之前要求使用者確認。我們認識到這種額外保護並非適用於每個使用案例（例如，沒有人類參與的使用案例），因此如果您想選擇退出並關閉它，請聯繫我們。

我們仍然建議採取預防措施，將 Claude 與敏感資料和操作隔離，以避免與提示注入相關的風險。

最後，請在您自己的產品中啟用電腦使用之前，告知終端使用者相關風險並取得他們的同意。

電腦使用參考實作

使用我們的電腦使用參考實作快速開始，其中包括網頁介面、Docker 容器、範例工具實作和代理迴圈。

注意： 該實作已更新，包含適用於 Claude 4 模型和 Claude Sonnet 3.7 的新工具。請確保拉取最新版本的儲存庫以存取這些新功能。

請使用此表單提供關於模型回應品質、API 本身或文件品質的意見回饋——我們迫不及待想聽到您的聲音！

快速開始

以下是如何開始使用電腦使用：

Beta 標頭僅在使用電腦使用工具時才需要。

上面的範例展示了三個工具一起使用，由於包含電腦使用工具，因此需要 beta 標頭。

電腦使用的運作方式

向 Claude 提供電腦使用工具和使用者提示
- 將電腦使用工具（以及可選的其他工具）新增到您的 API 請求中。
- 包含需要桌面互動的使用者提示，例如「將一張貓的圖片儲存到我的桌面。」
Claude 決定使用電腦使用工具
- Claude 評估電腦使用工具是否能幫助處理使用者的查詢。
- 如果可以，Claude 會建構格式正確的工具使用請求。
- API 回應的 stop_reason 為 tool_use，表示 Claude 的意圖。
提取工具輸入，在電腦上執行工具，並回傳結果
- 在您這端，從 Claude 的請求中提取工具名稱和輸入。
- 在容器或虛擬機器上使用該工具。
- 使用包含 tool_result 內容區塊的新 user 訊息繼續對話。
Claude 持續呼叫電腦使用工具直到完成任務
- Claude 分析工具結果以確定是否需要更多工具使用或任務已完成。

我們將步驟 3 和 4 在沒有使用者輸入的情況下重複稱為「代理迴圈」——即 Claude 以工具使用請求回應，而您的應用程式以評估該請求的結果回應 Claude。

運算環境

電腦使用需要一個沙盒運算環境，Claude 可以在其中安全地與應用程式和網路互動。此環境包括：

虛擬顯示器：虛擬 X11 顯示伺服器（使用 Xvfb），用於渲染 Claude 將透過螢幕截圖看到並使用滑鼠/鍵盤動作控制的桌面介面。
桌面環境：在 Linux 上執行的輕量級 UI，包含視窗管理器（Mutter）和面板（Tint2），為 Claude 提供一致的圖形介面進行互動。
應用程式：預先安裝的 Linux 應用程式，如 Firefox、LibreOffice、文字編輯器和檔案管理器，Claude 可以使用這些來完成任務。
工具實作：整合程式碼，將 Claude 的抽象工具請求（如「移動滑鼠」或「擷取螢幕截圖」）轉換為虛擬環境中的實際操作。
代理迴圈：處理 Claude 與環境之間通訊的程式，將 Claude 的動作傳送到環境並將結果（螢幕截圖、命令輸出）回傳給 Claude。

當您使用電腦使用時，Claude 不會直接連接到此環境。相反，您的應用程式會：

接收 Claude 的工具使用請求
將它們轉換為您運算環境中的動作
擷取結果（螢幕截圖、命令輸出等）
將這些結果回傳給 Claude

為了安全和隔離，參考實作將所有這些都在 Docker 容器內執行，並配有適當的連接埠映射，用於查看和與環境互動。

如何實作電腦使用

從我們的參考實作開始

我們建立了一個參考實作，其中包含您快速開始使用電腦使用所需的一切：

適合與 Claude 一起使用電腦使用的容器化環境
電腦使用工具的實作
與 Claude API 互動並執行電腦使用工具的代理迴圈
用於與容器、代理迴圈和工具互動的網頁介面。

理解多代理迴圈

電腦使用的核心是「代理迴圈」——Claude 請求工具動作、您的應用程式執行它們並將結果回傳給 Claude 的循環。以下是一個簡化的範例：

async def sampling_loop(
    *,
    model: str,
    messages: list[dict],
    api_key: str,
    max_tokens: int = 4096,
    tool_version: str,
    thinking_budget: int | None = None,
    max_iterations: int = 10,  # Add iteration limit to prevent infinite loops
):
    """
    A simple agent loop for Claude computer use interactions.

    This function handles the back-and-forth between:
    1. Sending user messages to Claude
    2. Claude requesting to use tools
    3. Your app executing those tools
    4. Sending tool results back to Claude
    """
    # Set up tools and API parameters
    client = Anthropic(api_key=api_key)
    beta_flag = "computer-use-2025-01-24" if "20250124" in tool_version else "computer-use-2024-10-22"

    # Configure tools - you should already have these initialized elsewhere
    tools = [
        {"type": f"computer_{tool_version}", "name": "computer", "display_width_px": 1024, "display_height_px": 768},
        {"type": f"text_editor_{tool_version}", "name": "str_replace_editor"},
        {"type": f"bash_{tool_version}", "name": "bash"}
    ]

    # Main agent loop (with iteration limit to prevent runaway API costs)
    iterations = 0
    while True and iterations < max_iterations:
        iterations += 1
        # Set up optional thinking parameter (for Claude Sonnet 3.7)
        thinking = None
        if thinking_budget:
            thinking = {"type": "enabled", "budget_tokens": thinking_budget}

        # Call the Claude API
        response = client.beta.messages.create(
            model=model,
            max_tokens=max_tokens,
            messages=messages,
            tools=tools,
            betas=[beta_flag],
            thinking=thinking
        )

        # Add Claude's response to the conversation history
        response_content = response.content
        messages.append({"role": "assistant", "content": response_content})

        # Check if Claude used any tools
        tool_results = []
        for block in response_content:
            if block.type == "tool_use":
                # In a real app, you would execute the tool here
                # For example: result = run_tool(block.name, block.input)
                result = {"result": "Tool executed successfully"}

                # Format the result for Claude
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })

        # If no tools were used, Claude is done - return the final messages
        if not tool_results:
            return messages

        # Add tool results to messages for the next iteration with Claude
        messages.append({"role": "user", "content": tool_results})

迴圈會持續進行，直到 Claude 在不請求任何工具的情況下回應（任務完成）或達到最大迭代限制。此保護措施可防止可能導致意外 API 費用的潛在無限迴圈。

我們建議在閱讀本文件的其餘部分之前先試用參考實作。

透過提示最佳化模型效能

以下是一些獲得最佳品質輸出的技巧：

指定簡單、定義明確的任務，並為每個步驟提供明確的指示。
Claude 有時會在沒有明確檢查結果的情況下假設其動作的結果。為防止這種情況，您可以提示 Claude：After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking: "I have evaluated step X..." If not correct, try again. Only when you confirm a step was executed correctly should you move on to the next one.
某些 UI 元素（如下拉選單和捲軸）可能對 Claude 使用滑鼠移動來操作比較困難。如果您遇到這種情況，請嘗試提示模型使用鍵盤快捷鍵。
對於可重複的任務或 UI 互動，在您的提示中包含成功結果的範例螢幕截圖和工具呼叫。
如果您需要模型登入，請在提示中使用 xml 標籤（如 <robot_credentials>）提供使用者名稱和密碼。在需要登入的應用程式中使用電腦使用會增加因提示注入而導致不良結果的風險。請在向模型提供登入憑證之前查看我們的減輕提示注入指南。

如果您反覆遇到一組明確的問題，或事先知道 Claude 需要完成的任務，請使用系統提示為 Claude 提供關於如何成功完成任務的明確提示或指示。

系統提示

當透過 Claude API 請求 Anthropic 定義的工具之一時，會產生一個電腦使用專用的系統提示。它類似於工具使用系統提示，但開頭為：

You have access to a set of functions you can use to answer the user's question. This includes access to a sandboxed computing environment. You do NOT currently have the ability to inspect files or interact with external resources, except by invoking the below functions.

與常規工具使用一樣，使用者提供的 system_prompt 欄位仍然會被尊重並用於建構組合系統提示。

可用動作

電腦使用工具支援以下動作：

基本動作（所有版本）

screenshot - 擷取目前顯示畫面
left_click - 在座標 [x, y] 處點擊
type - 輸入文字字串
key - 按下按鍵或按鍵組合（例如 "ctrl+s"）
mouse_move - 將游標移動到座標

增強動作（computer_20250124） 適用於 Claude 4 模型和 Claude Sonnet 3.7：

scroll - 以數量控制在任何方向捲動
left_click_drag - 在座標之間點擊並拖曳
right_click、middle_click - 額外的滑鼠按鈕
double_click、triple_click - 多次點擊
left_mouse_down、left_mouse_up - 精細的點擊控制
hold_key - 按住按鍵指定的持續時間（以秒為單位）
wait - 在動作之間暫停

增強動作（computer_20251124） 適用於 Claude Opus 4.6 和 Claude Opus 4.5：

computer_20250124 的所有動作
zoom - 以完整解析度查看螢幕的特定區域。需要在工具定義中設定 enable_zoom: true。接受帶有座標 [x1, y1, x2, y2] 的 region 參數，定義要檢視區域的左上角和右下角。

工具參數

參數	必填	說明
`type`	是	工具版本（`computer_20251124`、`computer_20250124` 或 `computer_20241022`）
`name`	是	必須為 "computer"
`display_width_px`	是	顯示寬度（像素）
`display_height_px`	是	顯示高度（像素）
`display_number`	否	X11 環境的顯示編號
`enable_zoom`

重要：電腦使用工具必須由您的應用程式明確執行——Claude 無法直接執行它。您負責根據 Claude 的請求實作螢幕截圖擷取、滑鼠移動、鍵盤輸入和其他動作。

在 Claude 4 模型和 Claude Sonnet 3.7 中啟用思考能力

Claude Sonnet 3.7 引入了新的「思考」能力，讓您可以看到模型在處理複雜任務時的推理過程。此功能幫助您了解 Claude 如何處理問題，對於除錯或教育目的特別有價值。

要啟用思考，請在您的 API 請求中新增 thinking 參數：

"thinking": {
  "type": "enabled",
  "budget_tokens": 1024
}

budget_tokens 參數指定 Claude 可以用於思考的 token 數量。這會從您的整體 max_tokens 預算中扣除。

啟用思考後，Claude 會將其推理過程作為回應的一部分回傳，這可以幫助您：

了解模型的決策過程
識別潛在的問題或誤解
從 Claude 解決問題的方法中學習
獲得對複雜多步驟操作的更多可見性

以下是思考輸出可能的範例：

[Thinking]
I need to save a picture of a cat to the desktop. Let me break this down into steps:

1. First, I'll take a screenshot to see what's on the desktop
2. Then I'll look for a web browser to search for cat images
3. After finding a suitable image, I'll need to save it to the desktop

Let me start by taking a screenshot to see what's available...

使用其他工具增強電腦使用

電腦使用工具可以與其他工具結合使用，以建立更強大的自動化工作流程。這在您需要以下情況時特別有用：

執行系統命令（bash 工具）
編輯設定檔或腳本（文字編輯器工具）
與自訂 API 或服務整合（自訂工具）

建構自訂電腦使用環境

參考實作旨在幫助您開始使用電腦使用功能。它包含讓 Claude 使用電腦所需的所有元件。然而，您可以根據需求建構自己的電腦使用環境。您需要：

適合與 Claude 進行電腦使用的虛擬化或容器化環境
至少一個 Anthropic 定義的電腦使用工具的實作
一個與 Claude API 互動並使用您的工具實作執行 tool_use 結果的代理迴圈
一個允許使用者輸入以啟動代理迴圈的 API 或 UI

實作電腦使用工具

電腦使用工具以無結構描述工具的方式實作。使用此工具時，您不需要像其他工具一樣提供輸入結構描述；結構描述已內建於 Claude 的模型中，且無法修改。

處理錯誤

實作電腦使用工具時，可能會發生各種錯誤。以下是處理方式：

處理較高解析度的座標縮放

API 將圖片限制為最長邊最多 1568 像素，總計約 1.15 百萬像素（詳見圖片調整大小）。例如，1512x982 的螢幕會被降採樣至約 1330x864。Claude 分析這個較小的圖片並在該空間中返回座標，但您的工具在原始螢幕空間中執行點擊。

除非您處理座標轉換，否則這可能導致 Claude 的點擊座標偏離目標。

要修正此問題，請自行調整螢幕截圖大小並將 Claude 的座標按比例放大：

遵循實作最佳實踐

了解電腦使用的限制

電腦使用功能目前處於測試版。雖然 Claude 的能力處於前沿，但開發者應注意其限制：

延遲：目前人機互動的電腦使用延遲可能比一般人類操作電腦的速度慢。我們建議在可信環境中專注於速度不是關鍵的使用情境（例如，背景資訊收集、自動化軟體測試）。
電腦視覺準確性和可靠性：Claude 在生成動作時輸出特定座標可能會犯錯或產生幻覺。Claude Sonnet 3.7 引入了思考能力，可以幫助您了解模型的推理過程並識別潛在問題。
工具選擇準確性和可靠性：Claude 在生成動作時選擇工具可能會犯錯或產生幻覺，或採取意外的動作來解決問題。此外，在與小眾應用程式或同時與多個應用程式互動時，可靠性可能較低。我們建議使用者在請求複雜任務時仔細提示模型。
捲動可靠性：Claude Sonnet 3.7 引入了具有方向控制的專用捲動動作，提高了可靠性。模型現在可以明確地按指定量向任何方向（上/下/左/右）捲動。
試算表互動：Claude Sonnet 3.7 中透過新增更精確的滑鼠控制動作（如 left_mouse_down、left_mouse_up）和新的修飾鍵支援，改善了試算表互動的滑鼠點擊。透過使用這些細粒度控制並將修飾鍵與點擊結合，儲存格選取可以更加可靠。
在社交和通訊平台上建立帳號和生成內容：雖然 Claude 會造訪網站，但我們限制了它在社交媒體網站和平台上建立帳號或生成和分享內容或以其他方式進行人類冒充的能力。我們未來可能會更新此功能。
漏洞：越獄或提示注入等漏洞可能存在於前沿 AI 系統中，包括測試版電腦使用 API。在某些情況下，Claude 會遵循內容中發現的指令，有時甚至與使用者的指示相衝突。例如，網頁上的 Claude 指令或圖片中包含的指令可能會覆蓋指示或導致 Claude 犯錯。我們建議： a. 將電腦使用限制在可信環境中，例如具有最小權限的虛擬機器或容器 b. 避免在沒有嚴格監督的情況下讓電腦使用存取敏感帳號或資料 c. 告知終端使用者相關風險，並在啟用或請求應用程式中電腦使用功能所需的權限之前獲得他們的同意
：根據 Anthropic 的服務條款，您不得使用電腦使用功能來違反任何法律或我們的可接受使用政策。

請務必仔細審查和驗證 Claude 的電腦使用動作和日誌。在沒有人類監督的情況下，不要將 Claude 用於需要完美精確度或涉及敏感使用者資訊的任務。

定價

Computer use follows the standard tool use pricing. When using the computer use tool:

System prompt overhead: The computer use beta adds 466-499 tokens to the system prompt

Computer use tool token usage:

Model	Input tokens per tool definition
Claude 4.x models	735 tokens
Claude Sonnet 3.7 (deprecated)	735 tokens

Additional token consumption:

Screenshot images (see Vision pricing)
Tool execution results returned to Claude

If you're also using bash or text editor tools alongside computer use, those tools have their own token costs as documented in their respective pages.

後續步驟

參考實作

使用我們完整的基於 Docker 的實作快速開始

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-6",  # or another compatible model
    max_tokens=1024,
    tools=[
        {
          "type": "computer_20251124",
          "name": "computer",
          "display_width_px": 1024,
          "display_height_px": 768,
          "display_number": 1,
        },
        {
          "type": "text_editor_20250728",
          "name": "str_replace_based_edit_tool"
        },
        {
          "type": "bash_20250124",
          "name": "bash"
        }
    ],
    messages=[{"role": "user", "content": "Save a picture of a cat to my desktop."}],
    betas=["computer-use-2025-11-24"]
)
print(response)

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: computer-use-2025-01-24" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 2000,
    "tools": [
      {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1024,
        "display_height_px": 768,
        "display_number": 1
      },
      {
        "type": "text_editor_20250728",
        "name": "str_replace_based_edit_tool"
      },
      {
        "type": "bash_20250124",
        "name": "bash"
      },
      {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"],
              "description": "The unit of temperature, either 'celsius' or 'fahrenheit'"
            }
          },
          "required": ["location"]
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Find flights from San Francisco to a place with warmer weather."
      }
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 1024
    }
  }'

設定您的運算環境
建立虛擬顯示器或連接到 Claude 將與之互動的現有顯示器。這通常涉及設定 Xvfb（X 虛擬幀緩衝區）或類似技術。

實作動作處理器

建立函式來處理 Claude 可能請求的每種動作類型：

def handle_computer_action(action_type, params):
    if action_type == "screenshot":
        return capture_screenshot()
    elif action_type == "left_click":
        x, y = params["coordinate"]
        return click_at(x, y)
    elif action_type == "type":
        return type_text(params["text"])
    # ... handle other actions

處理 Claude 的工具呼叫

從 Claude 的回應中提取並執行工具呼叫：

for content in response.content:
    if content.type == "tool_use":
        action = content.input["action"]
        result = handle_computer_action(action, content.input)
        
        # Return result to Claude
        tool_result = {
            "type": "tool_result",
            "tool_use_id": content.id,
            "content": result
        }

實作代理迴圈

建立一個持續執行直到 Claude 完成任務的迴圈：

while True:
    response = client.beta.messages.create(...)
    
    # Check if Claude used any tools
    tool_results = process_tool_calls(response)
    
    if not tool_results:
        # No more tool use, task complete
        break
        
    # Continue conversation with tool results
    messages.append({"role": "user", "content": tool_results})

import math

def get_scale_factor(width, height):
    """Calculate scale factor to meet API constraints."""
    long_edge = max(width, height)
    total_pixels = width * height

    long_edge_scale = 1568 / long_edge
    total_pixels_scale = math.sqrt(1_150_000 / total_pixels)

    return min(1.0, long_edge_scale, total_pixels_scale)

# When capturing screenshot
scale = get_scale_factor(screen_width, screen_height)
scaled_width = int(screen_width * scale)
scaled_height = int(screen_height * scale)

# Resize image to scaled dimensions before sending to Claude
screenshot = capture_and_resize(scaled_width, scaled_height)

# When handling Claude's coordinates, scale them back up
def execute_click(x, y):
    screen_x = x / scale
    screen_y = y / scale
    perform_click(screen_x, screen_y)

概述

模型相容性

安全注意事項

快速開始

電腦使用的運作方式

運算環境

如何實作電腦使用

從我們的參考實作開始

理解多代理迴圈

透過提示最佳化模型效能

系統提示

可用動作

動作範例

點擊和捲動動作的修飾鍵

工具參數

在 Claude 4 模型和 Claude Sonnet 3.7 中啟用思考能力

使用其他工具增強電腦使用

建構自訂電腦使用環境

實作電腦使用工具

處理錯誤

螢幕截圖擷取失敗

無效座標

處理較高解析度的座標縮放

遵循實作最佳實踐

使用適當的顯示解析度

實作適當的螢幕截圖處理

新增動作延遲

了解電腦使用的限制

定價

後續步驟

動作執行失敗

執行前驗證動作

記錄動作以便除錯