Messages圖片與視覺

座標與邊界框

Claude 如何調整圖片大小，以及如何處理它為邊界框、點和 UI 元素所回傳的像素座標。

Claude 可以定位並標記圖片中的區域（例如，回傳表格、表單欄位、圖表元素或 UI 元件的邊界框）。本指南說明 Claude 在處理圖片之前如何調整其大小，以及如何處理它回傳的像素座標，以便邊界框和點能與您的原始圖片對齊。

您在以下情境會需要這些資訊：OCR 流程、表單擷取、圖表解析、UI 元素定位，以及任何需要對圖片特定區域進行操作的任務。關於傳送圖片、支援的格式以及各模型的解析度限制，請參閱 Vision。

Claude 在使用絕對像素座標時效果最佳。 請在提示中明確要求使用此格式。例如：「以像素座標回傳每個表格的邊界框，格式為 [x1, y1, x2, y2]。」 當您要求正規化座標時，Claude 的表現不佳，例如：「回傳介於 0 到 1000 之間的邊界框座標。」 請一律要求像素座標，如有需要再於您自己的程式碼中進行正規化。

座標遵循標準的圖片慣例：原點 (0, 0) 位於圖片的左上角，x 向右遞增，y 向下遞增。Claude 回傳的座標是 Claude 所看到圖片中的像素位置：也就是 Claude 將您的圖片調整為符合模型原生解析度後的圖片（請參閱 Claude 如何調整圖片大小與填充）。若要取得可直接使用的座標，您可以預先調整圖片大小，使座標與您手上的圖片一對一對應（請參閱上傳前先調整圖片大小），或重新縮放 Claude 回傳的座標（請參閱無法預先調整大小時重新縮放座標）。

Claude 的空間推理能力有其限制（請參閱限制）。當您在提示中明確說明預期的座標格式，並在大規模處理前以視覺方式抽查結果時，座標準確度最佳。對於 PDF 支援，頁面會在伺服器端以您無法控制的尺寸光柵化為圖片，因此回傳的座標無法可靠地對應回頁面。若要在 PDF 內容上使用座標，請自行將頁面光柵化為圖片，並使用預先調整大小的方法。

Claude 如何調整圖片大小與填充

Claude 會找出同時滿足模型兩項圖片限制的最大保持長寬比尺寸：

邊長限制： 任一邊都不超過最大邊長（標準層級為 1568 px，高解析度層級為 2576 px）。
視覺 token 限制： 圖片的 token 成本 ⌈width / 28⌉ × ⌈height / 28⌉ 不超過模型的視覺 token 預算（標準層級為 1568 個 token，高解析度層級為 4784 個）。

請參閱解析度與 token 成本以了解哪些模型屬於哪個層級。

對於大多數照片和螢幕截圖，觸發調整大小的是邊長限制。對於直向文件，通常是視覺 token 限制先觸發，而忽略這一點是座標未對齊最常見的原因。例如，以 130 DPI 掃描的 A4 頁面為 1075×1520 像素：兩邊都小於 1568 px，但其成本為 39 × 55 = 2145 個視覺 token，因此 Claude 會將其調整為 924×1307。

接著，無論是否經過調整大小，Claude 都會在每張圖片的底部和右側邊緣填充至下一個 28 像素的倍數（在此範例中，924×1307 會變成 924×1316）。填充區域不包含任何內容：Claude 感知的是填充後的圖片，但頁面內容只會佔據未填充的調整後區域。請一律以調整後的尺寸進行正規化或重新縮放，而非填充後的尺寸；若除以填充後的尺寸，會使每個座標產生微小的縮放偏差。

上傳前先調整圖片大小

最可靠的方法是在上傳前自行調整圖片大小，如此一來您手上的圖片就與 Claude 看到的圖片完全相同，Claude 回傳的座標也無需轉換。

以下參考實作會計算 Claude 將圖片調整後的確切尺寸：

import math


def count_image_tokens(width: int, height: int) -> int:
    """Visual tokens consumed by an image: one token per 28x28 pixel patch."""
    return math.ceil(width / 28) * math.ceil(height / 28)


def resized_size(
    width: int,
    height: int,
    max_edge: int = 1568,
    max_tokens: int = 1568,
) -> tuple[int, int]:
    """The size Claude resizes an image to before padding.

    Defaults are for the standard resolution tier. For high-resolution-tier
    models, use max_edge=2576 and max_tokens=4784. Returns (width, height).
    Images that already fit within the limits are returned unchanged.
    """

    def fits(w: int, h: int) -> bool:
        return (
            math.ceil(w / 28) * 28 <= max_edge
            and math.ceil(h / 28) * 28 <= max_edge
            and count_image_tokens(w, h) <= max_tokens
        )

    if fits(width, height):
        return (width, height)
    if height > width:
        resized_h, resized_w = resized_size(height, width, max_edge, max_tokens)
        return (resized_w, resized_h)

    # 沿長邊進行二分搜尋，找出保持長寬比且能容納的最大尺寸。
    # 
    aspect_ratio = width / height
    lo, hi = 1, width  # lo always fits; hi never fits
    while lo + 1 < hi:
        mid = (lo + hi) // 2
        if fits(mid, max(round(mid / aspect_ratio), 1)):
            lo = mid
        else:
            hi = mid
    return (lo, max(round(lo / aspect_ratio), 1))


# 來自「Claude 如何調整圖片大小與填充」的 A4 範例：
print(resized_size(1075, 1520))  # (924, 1307)

將圖片調整為 resized_size 回傳的尺寸。如果圖片已符合模型的限制，resized_size 會原封不動地回傳其尺寸，無需調整大小。
將調整後的圖片傳送至 API。請勿自行填充；Claude 會處理填充，且填充不會移動座標原點。
在您的提示中，明確要求像素座標。例如：「以像素座標回傳每個表格的邊界框，格式為 [x1, y1, x2, y2]。」
直接將回傳的座標套用於您傳送的圖片。如果您需要正規化座標，請除以您傳送圖片的尺寸，而非原始圖片的尺寸，也非填充後的尺寸。

無法預先調整大小時重新縮放座標

如果您無法預先調整大小（例如，圖片來自您無法修改的上游系統），請使用上傳前先調整圖片大小中的 resized_size 來還原 Claude 所看到的尺寸，然後將 Claude 回傳的座標對應為正規化座標或對應回您的原始圖片。此方法需要知道您上傳圖片的像素尺寸，因此不適用於 PDF 上傳。

def to_relative_coordinates(
    x: float,
    y: float,
    original_width: int,
    original_height: int,
    max_edge: int = 1568,
    max_tokens: int = 1568,
) -> tuple[float, float]:
    """Map a pixel coordinate returned by Claude to relative coordinates in [0, 1].

    Pass the dimensions of the image you uploaded. For high-resolution-tier
    models, use max_edge=2576 and max_tokens=4784.
    """
    resized_w, resized_h = resized_size(
        original_width, original_height, max_edge, max_tokens
    )
    return (x / resized_w, y / resized_h)


# 若要將座標轉換為原始圖片的像素空間，請將
# 相對座標乘以原始尺寸：
# (rel_x * original_width, rel_y * original_height)

填充僅套用於底部和右側邊緣，因此原點不會移動，按軸線性重新縮放即已足夠。

座標與邊界框

Claude 如何調整圖片大小，以及如何處理它為邊界框、點和 UI 元素所回傳的像素座標。

Claude 如何調整圖片大小與填充

Claude 會找出同時滿足模型兩項圖片限制的最大保持長寬比尺寸：

邊長限制： 任一邊都不超過最大邊長（標準層級為 1568 px，高解析度層級為 2576 px）。
視覺 token 限制： 圖片的 token 成本 ⌈width / 28⌉ × ⌈height / 28⌉ 不超過模型的視覺 token 預算（標準層級為 1568 個 token，高解析度層級為 4784 個）。

請參閱解析度與 token 成本以了解哪些模型屬於哪個層級。

上傳前先調整圖片大小

最可靠的方法是在上傳前自行調整圖片大小，如此一來您手上的圖片就與 Claude 看到的圖片完全相同，Claude 回傳的座標也無需轉換。

以下參考實作會計算 Claude 將圖片調整後的確切尺寸：

import math


def count_image_tokens(width: int, height: int) -> int:
    """Visual tokens consumed by an image: one token per 28x28 pixel patch."""
    return math.ceil(width / 28) * math.ceil(height / 28)


def resized_size(
    width: int,
    height: int,
    max_edge: int = 1568,
    max_tokens: int = 1568,
) -> tuple[int, int]:
    """The size Claude resizes an image to before padding.

    Defaults are for the standard resolution tier. For high-resolution-tier
    models, use max_edge=2576 and max_tokens=4784. Returns (width, height).
    Images that already fit within the limits are returned unchanged.
    """

    def fits(w: int, h: int) -> bool:
        return (
            math.ceil(w / 28) * 28 <= max_edge
            and math.ceil(h / 28) * 28 <= max_edge
            and count_image_tokens(w, h) <= max_tokens
        )

    if fits(width, height):
        return (width, height)
    if height > width:
        resized_h, resized_w = resized_size(height, width, max_edge, max_tokens)
        return (resized_w, resized_h)

    # 沿長邊進行二分搜尋，找出保持長寬比且能容納的最大尺寸。
    # 
    aspect_ratio = width / height
    lo, hi = 1, width  # lo always fits; hi never fits
    while lo + 1 < hi:
        mid = (lo + hi) // 2
        if fits(mid, max(round(mid / aspect_ratio), 1)):
            lo = mid
        else:
            hi = mid
    return (lo, max(round(lo / aspect_ratio), 1))


# 來自「Claude 如何調整圖片大小與填充」的 A4 範例：
print(resized_size(1075, 1520))  # (924, 1307)

將圖片調整為 resized_size 回傳的尺寸。如果圖片已符合模型的限制，resized_size 會原封不動地回傳其尺寸，無需調整大小。
將調整後的圖片傳送至 API。請勿自行填充；Claude 會處理填充，且填充不會移動座標原點。
在您的提示中，明確要求像素座標。例如：「以像素座標回傳每個表格的邊界框，格式為 [x1, y1, x2, y2]。」
直接將回傳的座標套用於您傳送的圖片。如果您需要正規化座標，請除以您傳送圖片的尺寸，而非原始圖片的尺寸，也非填充後的尺寸。

無法預先調整大小時重新縮放座標

def to_relative_coordinates(
    x: float,
    y: float,
    original_width: int,
    original_height: int,
    max_edge: int = 1568,
    max_tokens: int = 1568,
) -> tuple[float, float]:
    """Map a pixel coordinate returned by Claude to relative coordinates in [0, 1].

    Pass the dimensions of the image you uploaded. For high-resolution-tier
    models, use max_edge=2576 and max_tokens=4784.
    """
    resized_w, resized_h = resized_size(
        original_width, original_height, max_edge, max_tokens
    )
    return (x / resized_w, y / resized_h)


# 若要將座標轉換為原始圖片的像素空間，請將
# 相對座標乘以原始尺寸：
# (rel_x * original_width, rel_y * original_height)

填充僅套用於底部和右側邊緣，因此原點不會移動，按軸線性重新縮放即已足夠。

座標與邊界框

Claude 如何調整圖片大小與填充

上傳前先調整圖片大小

無法預先調整大小時重新縮放座標

相關資訊

座標與邊界框

Claude 如何調整圖片大小與填充

上傳前先調整圖片大小

無法預先調整大小時重新縮放座標

相關資訊

Claude 如何調整圖片大小與填充

上傳前先調整圖片大小

無法預先調整大小時重新縮放座標

相關資訊

Claude 如何調整圖片大小與填充

上傳前先調整圖片大小

無法預先調整大小時重新縮放座標

相關資訊

Claude 如何調整圖片大小與填充

上傳前先調整圖片大小

無法預先調整大小時重新縮放座標

相關資訊

Claude 如何調整圖片大小與填充

上傳前先調整圖片大小

無法預先調整大小時重新縮放座標

相關資訊