Loading...
    • 開發者指南
    • API 參考
    • MCP
    • 資源
    • 發行說明
    Search...
    ⌘K
    資源
    概覽術語表系統提示詞
    概覽工單路由客戶支援代理內容審核法律摘要
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    使用案例

    內容審核

    內容審核是在數位應用程式中維護安全、尊重和高效環境的關鍵面向。在本指南中,我們將討論如何使用 Claude 在您的數位應用程式中進行內容審核。

    請造訪我們的內容審核 cookbook,查看使用 Claude 實作內容審核的範例。

    本指南專注於審核您應用程式中的使用者生成內容。如果您正在尋找審核與 Claude 互動的指導,請參閱我們的防護機制指南。

    使用 Claude 建構之前

    決定是否使用 Claude 進行內容審核

    以下是一些關鍵指標,說明您應該使用像 Claude 這樣的 LLM,而非傳統 ML 或基於規則的方法來進行內容審核:

    Anthropic 已訓練所有 Claude 模型使其誠實、有幫助且無害。這可能導致 Claude 審核被認為特別危險的內容(符合我們的可接受使用政策),無論使用什麼提示詞。例如,一個希望允許使用者發布露骨性內容的成人網站可能會發現,即使他們在提示詞中指定不要審核露骨性內容,Claude 仍然會將露骨內容標記為需要審核。我們建議在建構審核解決方案之前先審查我們的 AUP。

    生成需要審核的內容範例

    在開發內容審核解決方案之前,首先建立應該被標記和不應該被標記的內容範例。確保包含邊緣案例和可能對內容審核系統構成挑戰的困難場景。之後,審查您的範例以建立一個定義明確的審核類別清單。 例如,社群媒體平台生成的範例可能包括以下內容:

    allowed_user_comments = [
        'This movie was great, I really enjoyed it. The main actor really killed it!',
        'I hate Mondays.',
        'It is a great time to invest in gold!'
    ]
    
    disallowed_user_comments = [
        'Delete this post now or you better hide. I am coming after you and your family.',
        'Stay away from the 5G cellphones!! They are using 5G to control you.',
        'Congratulations! You have won a $1,000 gift card. Click here to claim your prize!'
    ]
    
    # Sample user comments to test the content moderation
    user_comments = allowed_user_comments + disallowed_user_comments
    
    # List of categories considered unsafe for content moderation
    unsafe_categories = [
        'Child Exploitation',
        'Conspiracy Theories',
        'Hate',
        'Indiscriminate Weapons', 
        'Intellectual Property',
        'Non-Violent Crimes', 
        'Privacy',
        'Self-Harm',
        'Sex Crimes',
        'Sexual Content',
        'Specialized Advice',
        'Violent Crimes'
    ]

    有效審核這些範例需要對語言有細緻的理解。在評論 This movie was great, I really enjoyed it. The main actor really killed it! 中,內容審核系統需要識別「killed it」是一個比喻,而非實際暴力的指示。相反地,儘管沒有明確提及暴力,評論 Delete this post now or you better hide. I am coming after you and your family. 應該被內容審核系統標記。

    unsafe_categories 清單可以根據您的特定需求進行自訂。例如,如果您希望防止未成年人在您的網站上建立內容,您可以將「Underage Posting」附加到清單中。


    如何使用 Claude 進行內容審核

    選擇合適的 Claude 模型

    在選擇模型時,重要的是考慮您的資料規模。如果成本是一個考量因素,像 Claude Haiku 3 這樣的較小模型因其成本效益而是一個絕佳選擇。以下是對一個每月接收十億則貼文的社群媒體平台進行文字審核的成本估算:

    • 內容規模

      • 每月貼文數:10 億
      • 每則貼文字元數:100
      • 總字元數:1000 億
    • 估計 token 數

      • 輸入 token:286 億(假設每 3.5 個字元 1 個 token)
      • 被標記的訊息百分比:3%
      • 每則被標記訊息的輸出 token:50
      • 總輸出 token:15 億
    • Claude Haiku 3 估計成本

      • 輸入 token 成本:2,860 MTok * $0.25/MTok = $715
      • 輸出 token 成本:1,500 MTok * $1.25/MTok = $1,875
      • 每月成本:$715 + $1,875 = $2,590
    • Claude Opus 4.6 估計成本

      • 輸入 token 成本:2,860 MTok * $5.00/MTok = $14,300
      • 輸出 token 成本:1,500 MTok * $25.00/MTok = $37,500
      • 每月成本:$14,300 + $37,500 = $51,800
    實際成本可能與這些估算不同。這些估算基於批次處理部分中強調的提示詞。透過從回應中移除 explanation 欄位,可以進一步減少輸出 token。

    建構強大的提示詞

    為了使用 Claude 進行內容審核,Claude 必須了解您應用程式的審核需求。讓我們從撰寫一個允許您定義審核需求的提示詞開始:

    import anthropic
    import json
    
    # Initialize the Anthropic client
    client = anthropic.Anthropic()
    
    def moderate_message(message, unsafe_categories):
        # Convert the list of unsafe categories into a string, with each category on a new line
        unsafe_category_str = '\n'.join(unsafe_categories)
        
        # Construct the prompt for Claude, including the message and unsafe categories
        assessment_prompt = f"""
        Determine whether the following message warrants moderation, 
        based on the unsafe categories outlined below.
        
        Message:
        <message>{message}</message>
    
        Unsafe Categories:
        <categories>
        {unsafe_category_str}
        </categories>
    
        Respond with ONLY a JSON object, using the format below:
        {{
        "violation": <Boolean field denoting whether the message should be moderated>,
        "categories": [Comma-separated list of violated categories],
        "explanation": [Optional. Only include if there is a violation.]
        }}"""
    
        # Send the request to Claude for content moderation
        response = client.messages.create(
            model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
            max_tokens=200,
            temperature=0,   # Use 0 temperature for increased consistency
            messages=[
                {"role": "user", "content": assessment_prompt}
            ]
        )
        
        # Parse the JSON response from Claude
        assessment = json.loads(response.content[0].text)
        
        # Extract the violation status from the assessment
        contains_violation = assessment['violation']
        
        # If there's a violation, get the categories and explanation; otherwise, use empty defaults
        violated_categories = assessment.get('categories', []) if contains_violation else []
        explanation = assessment.get('explanation') if contains_violation else None
        
        return contains_violation, violated_categories, explanation
    
    # Process each comment and print the results
    for comment in user_comments:
        print(f"\nComment: {comment}")
        violation, violated_categories, explanation = moderate_message(comment, unsafe_categories)
        
        if violation:
            print(f"Violated Categories: {', '.join(violated_categories)}")
            print(f"Explanation: {explanation}")
        else:
            print("No issues detected.")

    在此範例中,moderate_message 函式包含一個評估提示詞,其中包括不安全內容類別和我們希望評估的訊息。提示詞要求 Claude 根據我們定義的不安全類別評估訊息是否應該被審核。

    然後解析模型的評估結果以確定是否存在違規。如果存在違規,Claude 還會返回一個違規類別清單,以及訊息為何不安全的解釋。

    評估您的提示詞

    內容審核是一個分類問題。因此,您可以使用我們分類 cookbook 中概述的相同技術來確定您的內容審核系統的準確性。

    一個額外的考量是,您可以不將內容審核視為二元分類問題,而是建立多個類別來表示各種風險等級。建立多個風險等級允許您調整審核的積極程度。例如,您可能希望自動封鎖被認為高風險的使用者查詢,而有許多中等風險查詢的使用者則被標記進行人工審查。

    import anthropic
    import json
    
    # Initialize the Anthropic client
    client = anthropic.Anthropic()
    
    def assess_risk_level(message, unsafe_categories):
        # Convert the list of unsafe categories into a string, with each category on a new line
        unsafe_category_str = '\n'.join(unsafe_categories)
        
        # Construct the prompt for Claude, including the message, unsafe categories, and risk level definitions
        assessment_prompt = f"""
        Assess the risk level of the following message warrants moderation, 
        based on the unsafe categories listed below.
        
    Message:
    <message>{message}</message>
    
    Unsafe Categories:
    <categories>
    {unsafe_category_str}
    </categories>
    
    Assign a risk level based on your confidence that the user's message should be moderated 
    based on the defined unsafe categories:
    
    0 - No risk
    1 - Low risk
    2 - Medium risk
    3 - High risk
    
    Respond with ONLY a JSON object, using the format below:
    {{
      "risk_level": <Numerical field denoting the risk level>,
      "categories": [Comma-separated list of violated categories],
      "explanation": <Optional. Only include if risk level is greater than 0>
    }}"""
    
        # Send the request to Claude for risk assessment
        response = client.messages.create(
            model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
            max_tokens=200,
            temperature=0,   # Use 0 temperature for increased consistency
            messages=[
                {"role": "user", "content": assessment_prompt}
            ]
        )
        
        # Parse the JSON response from Claude
        assessment = json.loads(response.content[0].text)
        
        # Extract the risk level, violated categories, and explanation from the assessment
        risk_level = assessment["risk_level"]
        violated_categories = assessment["categories"]
        explanation = assessment.get("explanation")
        
        return risk_level, violated_categories, explanation
    
    # Process each comment and print the results
    for comment in user_comments:
        print(f"\nComment: {comment}")
        risk_level, violated_categories, explanation = assess_risk_level(comment, unsafe_categories)
        
        print(f"Risk Level: {risk_level}")
        if violated_categories:
            print(f"Violated Categories: {', '.join(violated_categories)}")
        if explanation:
            print(f"Explanation: {explanation}")

    此程式碼實作了一個 assess_risk_level 函式,使用 Claude 來評估訊息的風險等級。該函式接受一則訊息和一個不安全類別清單作為輸入。

    在函式內部,為 Claude 生成一個提示詞,包括要評估的訊息、不安全類別以及評估風險等級的具體指示。提示詞指示 Claude 以 JSON 物件回應,其中包括風險等級、違規類別和可選的解釋。

    這種方法透過分配風險等級實現靈活的內容審核。它可以無縫整合到更大的系統中,根據評估的風險等級自動過濾內容或標記評論進行人工審查。例如,執行此程式碼時,評論 Delete this post now or you better hide. I am coming after you and your family. 因其危險威脅而被識別為高風險。相反地,評論 Stay away from the 5G cellphones!! They are using 5G to control you. 被歸類為中等風險。

    部署您的提示詞

    一旦您對解決方案的品質有信心,就是將其部署到生產環境的時候了。以下是在生產環境中使用內容審核時應遵循的一些最佳實踐:

    1. 向使用者提供清晰的回饋: 當使用者輸入被封鎖或回應因內容審核而被標記時,提供有資訊性和建設性的回饋,幫助使用者了解為什麼他們的訊息被標記以及如何適當地重新措辭。在上面的程式碼範例中,這是透過 Claude 回應中的 explanation 標籤完成的。

    2. 分析被審核的內容: 追蹤您的審核系統標記的內容類型,以識別趨勢和潛在的改進領域。

    3. 持續評估和改進: 使用精確率和召回率追蹤等指標定期評估您的內容審核系統的效能。使用這些資料迭代地改進您的審核提示詞、關鍵字和評估標準。


    提升效能

    在複雜場景中,除了標準的提示詞工程技術之外,考慮額外的策略來提升效能可能會有所幫助。以下是一些進階策略:

    定義主題並提供範例

    除了在提示詞中列出不安全類別外,還可以透過提供與每個類別相關的定義和短語來進一步改進。

    import anthropic
    import json
    
    # Initialize the Anthropic client
    client = anthropic.Anthropic()
    
    # Dictionary of categories considered unsafe for content moderation, with their definitions
    unsafe_category_definitions = {
        'Child Exploitation': 'Content that depicts child nudity or that enables, encourages, excuses, or depicts the sexual abuse of children.',
        'Conspiracy Theories': """Content that promotes or endorses unfounded, false, or misleading theories about events, situations, or individuals, 
            which can incite harm or undermine public trust in institutions or facts.""",
        'Hate': """Content that is hateful toward people on the basis of their protected characteristics 
            (race, color, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease), 
            or content that perpetuates negative stereotypes.""",
        'Indiscriminate Weapons': 'Content that enables the creation of weapons of mass destruction.', 
        'Intellectual Property': 'Content that violates the intellectual property rights of third parties.',
        'Non-Violent Crimes': 'Content that enables, encourages, or excuses the commission of non-violent crimes.', 
        'Privacy': 'Content that contains sensitive, personal information about private individuals.',
        'Self-Harm': 'Content that enables, encourages, or excuses acts of intentional self-harm.',
        'Sex Crimes': 'Content that enables, encourages, or excuses the commission of sex crimes.',
        'Sexual Content': 'Content that contains erotic descriptions or explicit depictions of sex acts.',
        'Specialized Advice': """Content that contains financial, medical, or legal advice. 
            Financial advice includes guidance on investments, stocks, bonds, or any financial planning.""",
        'Violent Crimes': 'Content that enables, encourages, or excuses the commission of violent crimes.', 
    }
    
    
    def moderate_message_with_definitions(message, unsafe_category_definitions):
        # Format unsafe categories string, with each category and its definition on a new line
        unsafe_category_str = '\n'.join(f'{category}: {definition}' 
            for category, definition in unsafe_category_definitions.items()
        )
        
        # Construct the prompt for Claude, including the message and unsafe categories
        assessment_prompt = f"""Determine whether the following message warrants moderation, based on the unsafe categories outlined below.
    
    Message:
    <message>{message}</message>
    
    Unsafe Categories and Their Definitions:
    <categories>
    {unsafe_category_str}
    </categories>
    
    It's important that you remember all unsafe categories and their definitions.
    
    Respond with ONLY a JSON object, using the format below:
    {{
      "violation": <Boolean field denoting whether the message should be moderated>,
      "categories": [Comma-separated list of violated categories],
      "explanation": [Optional. Only include if there is a violation.]
    }}"""
    
        # Send the request to Claude for content moderation
        response = client.messages.create(
            model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
            max_tokens=200,
            temperature=0,   # Use 0 temperature for increased consistency
            messages=[
                {"role": "user", "content": assessment_prompt}
            ]
        )
        
        # Parse the JSON response from Claude
        assessment = json.loads(response.content[0].text)
        
        # Extract the violation status from the assessment
        contains_violation = assessment['violation']
        
        # If there's a violation, get the categories and explanation; otherwise, use empty defaults
        violated_categories = assessment.get('categories', []) if contains_violation else []
        explanation = assessment.get('explanation') if contains_violation else None
        
        return contains_violation, violated_categories, explanation
    
    
    # Process each comment and print the results
    for comment in user_comments:
        print(f"\nComment: {comment}")
        violation, violated_categories, explanation = moderate_message_with_definitions(comment, unsafe_category_definitions)
        
        if violation:
            print(f"Violated Categories: {', '.join(violated_categories)}")
            print(f"Explanation: {explanation}")
        else:
            print("No issues detected.")

    moderate_message_with_definitions 函式擴展了先前的 moderate_message 函式,允許每個不安全類別與詳細定義配對。這在程式碼中透過將原始函式中的 unsafe_categories 清單替換為 unsafe_category_definitions 字典來實現。此字典將每個不安全類別映射到其對應的定義。類別名稱及其定義都包含在提示詞中。

    值得注意的是,Specialized Advice 類別的定義現在指定了應禁止的財務建議類型。因此,先前通過 moderate_message 評估的評論 It's a great time to invest in gold! 現在會觸發違規。

    考慮批次處理

    為了在不需要即時審核的情況下降低成本,請考慮批次審核訊息。在提示詞的上下文中包含多則訊息,並要求 Claude 評估哪些訊息應該被審核。

    import anthropic
    import json
    
    # Initialize the Anthropic client
    client = anthropic.Anthropic()
    
    def batch_moderate_messages(messages, unsafe_categories):
        # Convert the list of unsafe categories into a string, with each category on a new line
        unsafe_category_str = '\n'.join(unsafe_categories)
        
        # Format messages string, with each message wrapped in XML-like tags and given an ID
        messages_str = '\n'.join([f'<message id={idx}>{msg}</message>' for idx, msg in enumerate(messages)])
        
        # Construct the prompt for Claude, including the messages and unsafe categories
        assessment_prompt = f"""Determine the messages to moderate, based on the unsafe categories outlined below.
    
    Messages:
    <messages>
    {messages_str}
    </messages>
    
    Unsafe categories and their definitions:
    <categories>
    {unsafe_category_str}
    </categories>
    
    Respond with ONLY a JSON object, using the format below:
    {{
      "violations": [
        {{
          "id": <message id>,
          "categories": [list of violated categories],
          "explanation": <Explanation of why there's a violation>
        }},
        ...
      ]
    }}
    
    Important Notes:
    - Remember to analyze every message for a violation.
    - Select any number of violations that reasonably apply."""
    
        # Send the request to Claude for content moderation
        response = client.messages.create(
            model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
            max_tokens=2048,  # Increased max token count to handle batches
            temperature=0,    # Use 0 temperature for increased consistency
            messages=[
                {"role": "user", "content": assessment_prompt}
            ]
        )
        
        # Parse the JSON response from Claude
        assessment = json.loads(response.content[0].text)
        return assessment
    
    
    # Process the batch of comments and get the response
    response_obj = batch_moderate_messages(user_comments, unsafe_categories)
    
    # Print the results for each detected violation
    for violation in response_obj['violations']:
        print(f"""Comment: {user_comments[violation['id']]}
    Violated Categories: {', '.join(violation['categories'])}
    Explanation: {violation['explanation']}
    """)

    在此範例中,batch_moderate_messages 函式透過單次 Claude API 呼叫處理整批訊息的審核。 在函式內部,建立了一個提示詞,其中包括要評估的訊息清單、定義的不安全內容類別及其描述。提示詞指示 Claude 返回一個 JSON 物件,列出所有包含違規的訊息。回應中的每則訊息都透過其 id 識別,該 id 對應於訊息在輸入清單中的位置。 請記住,找到適合您特定需求的最佳批次大小可能需要一些實驗。雖然較大的批次大小可以降低成本,但也可能導致品質略有下降。此外,您可能需要增加 Claude API 呼叫中的 max_tokens 參數以容納更長的回應。有關您所選模型可以輸出的最大 token 數的詳細資訊,請參閱模型比較頁面。

    內容審核 cookbook

    查看如何使用 Claude 進行內容審核的完整程式碼範例實作。

    防護機制指南

    探索我們的防護機制指南,了解審核與 Claude 互動的技術。

    Was this page helpful?

    • 使用 Claude 建構之前
    • 決定是否使用 Claude 進行內容審核
    • 如何使用 Claude 進行內容審核
    • 選擇合適的 Claude 模型