Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K

    First steps

    Intro to ClaudeQuickstart

    Models & pricing

    Models overviewChoosing a modelWhat's new in Claude 4.5Migrating to Claude 4.5Model deprecationsPricing

    Build with Claude

    Features overviewUsing the Messages APIContext windowsPrompting best practices

    Capabilities

    Prompt cachingContext editingExtended thinkingStreaming MessagesBatch processingCitationsMultilingual supportToken countingEmbeddingsVisionPDF supportFiles APISearch resultsGoogle Sheets add-on

    Tools

    OverviewHow to implement tool useToken-efficient tool useFine-grained tool streamingBash toolCode execution toolComputer use toolText editor toolWeb fetch toolWeb search toolMemory tool

    Agent Skills

    OverviewQuickstartBest practicesUsing Skills with the API

    Agent SDK

    OverviewTypeScript SDKPython SDK

    Guides

    Streaming InputHandling PermissionsSession ManagementHosting the Agent SDKModifying system promptsMCP in the SDKCustom ToolsSubagents in the SDKSlash Commands in the SDKAgent Skills in the SDKTracking Costs and UsageTodo ListsPlugins in the SDK

    MCP in the API

    MCP connectorRemote MCP servers

    Claude on 3rd-party platforms

    Amazon BedrockVertex AI

    Prompt engineering

    OverviewPrompt generatorUse prompt templatesPrompt improverBe clear and directUse examples (multishot prompting)Let Claude think (CoT)Use XML tagsGive Claude a role (system prompts)Prefill Claude's responseChain complex promptsLong context tipsExtended thinking tips

    Test & evaluate

    Define success criteriaDevelop test casesUsing the Evaluation ToolReducing latency

    Strengthen guardrails

    Reduce hallucinationsIncrease output consistencyMitigate jailbreaksStreaming refusalsReduce prompt leakKeep Claude in character

    Administration and monitoring

    Admin API overviewUsage and Cost APIClaude Code Analytics API
    Console
    Test & evaluate

    Define your success criteria

    Building a successful LLM-based application starts with clearly defining your success criteria. How will you know when your application is good enough to publish?

    Having clear success criteria ensures that your prompt engineering & optimization efforts are focused on achieving specific, measurable goals.


    Building strong criteria

    Good success criteria are:

    • Specific: Clearly define what you want to achieve. Instead of "good performance," specify "accurate sentiment classification."

    • Measurable: Use quantitative metrics or well-defined qualitative scales. Numbers provide clarity and scalability, but qualitative measures can be valuable if consistently applied along with quantitative measures.

      • Even "hazy" topics such as ethics and safety can be quantified:
        Safety criteria
        BadSafe outputs
        GoodLess than 0.1% of outputs out of 10,000 trials flagged for toxicity by our content filter.

    • Achievable: Base your targets on industry benchmarks, prior experiments, AI research, or expert knowledge. Your success metrics should not be unrealistic to current frontier model capabilities.

    • Relevant: Align your criteria with your application's purpose and user needs. Strong citation accuracy might be critical for medical apps but less so for casual chatbots.


    Common success criteria to consider

    Here are some criteria that might be important for your use case. This list is non-exhaustive.

    Most use cases will need multidimensional evaluation along several success criteria.


    Next steps

    Brainstorm criteria

    Brainstorm success criteria for your use case with Claude on claude.ai.

    Tip: Drop this page into the chat as guidance for Claude!

    Design evaluations

    Learn to build strong test sets to gauge Claude's performance against your criteria.

    • Building strong criteria
    • Common success criteria to consider
    • Next steps
    © 2025 ANTHROPIC PBC

    Products

    • Claude
    • Claude Code
    • Max plan
    • Team plan
    • Enterprise plan
    • Download app
    • Pricing
    • Log in

    Features

    • Claude and Slack
    • Claude in Excel

    Models

    • Opus
    • Sonnet
    • Haiku

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Claude Developer Platform

    • Overview
    • Developer docs
    • Pricing
    • Amazon Bedrock
    • Google Cloud’s Vertex AI
    • Console login

    Learn

    • Blog
    • Catalog
    • Courses
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Help and security

    • Availability
    • Status
    • Support center

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy

    Products

    • Claude
    • Claude Code
    • Max plan
    • Team plan
    • Enterprise plan
    • Download app
    • Pricing
    • Log in

    Features

    • Claude and Slack
    • Claude in Excel

    Models

    • Opus
    • Sonnet
    • Haiku

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Claude Developer Platform

    • Overview
    • Developer docs
    • Pricing
    • Amazon Bedrock
    • Google Cloud’s Vertex AI
    • Console login

    Learn

    • Blog
    • Catalog
    • Courses
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Help and security

    • Availability
    • Status
    • Support center

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    © 2025 ANTHROPIC PBC