Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K
    First steps
    Intro to ClaudeQuickstart
    Models & pricing
    Models overviewChoosing a modelWhat's new in Claude 4.5Migrating to Claude 4.5Model deprecationsPricing
    Build with Claude
    Features overviewUsing the Messages APIContext windowsPrompting best practices
    Capabilities
    Prompt cachingContext editingExtended thinkingEffortStreaming MessagesBatch processingCitationsMultilingual supportToken countingEmbeddingsVisionPDF supportFiles APISearch resultsStructured outputs
    Tools
    OverviewHow to implement tool useFine-grained tool streamingBash toolCode execution toolProgrammatic tool callingComputer use toolText editor toolWeb fetch toolWeb search toolMemory toolTool search tool
    Agent Skills
    OverviewQuickstartBest practicesSkills for enterpriseUsing Skills with the API
    Agent SDK
    OverviewQuickstartTypeScript SDKTypeScript V2 (preview)Python SDKMigration Guide
    MCP in the API
    MCP connectorRemote MCP servers
    Claude on 3rd-party platforms
    Amazon BedrockMicrosoft FoundryVertex AI
    Prompt engineering
    OverviewPrompt generatorUse prompt templatesPrompt improverBe clear and directUse examples (multishot prompting)Let Claude think (CoT)Use XML tagsGive Claude a role (system prompts)Prefill Claude's responseChain complex promptsLong context tipsExtended thinking tips
    Test & evaluate
    Define success criteriaDevelop test casesUsing the Evaluation ToolReducing latency
    Strengthen guardrails
    Reduce hallucinationsIncrease output consistencyMitigate jailbreaksStreaming refusalsReduce prompt leakKeep Claude in character
    Administration and monitoring
    Admin API overviewWorkspacesUsage and Cost APIClaude Code Analytics API
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    Agent Skills

    Skills for enterprise

    Governance, security review, evaluation, and organizational guidance for deploying Agent Skills at enterprise scale.

    This guide is for enterprise admins and architects who need to govern Agent Skills across an organization. It covers how to vet, evaluate, deploy, and manage Skills at scale. For authoring guidance, see best practices. For architecture details, see the Skills overview.

    Security review and vetting

    Deploying Skills in an enterprise requires answering two distinct questions:

    1. Are Skills safe in general? See the security considerations section in the overview for platform-level security details.
    2. How do I vet a specific Skill? Use the risk assessment and review checklist below.

    Risk tier assessment

    Evaluate each Skill against these risk indicators before approving deployment:

    Risk indicatorWhat to look forConcern level
    Code executionScripts in the Skill directory (*.py, *.sh, *.js)High: scripts run with full environment access
    Instruction manipulationDirectives to ignore safety rules, hide actions from users, or alter Claude's behavior conditionallyHigh: can bypass security controls
    MCP server referencesInstructions referencing MCP tools (ServerName:tool_name)High: extends access beyond the Skill itself
    Network access patternsURLs, API endpoints, fetch, curl, or requests callsHigh: potential data exfiltration vector
    Hardcoded credentialsAPI keys, tokens, or passwords in Skill files or scriptsHigh: secrets exposed in Git history and context window
    File system access scopePaths outside the Skill directory, broad glob patterns, path traversal (../)Medium: may access unintended data
    Tool invocationsInstructions directing Claude to use bash, file operations, or other toolsMedium: review what operations are performed

    Review checklist

    Before deploying any Skill from a third party or internal contributor, complete these steps:

    1. Read all Skill directory content. Review SKILL.md, all referenced markdown files, and any bundled scripts or resources.
    2. Verify script behavior matches stated purpose. Run scripts in a sandboxed environment and confirm outputs align with the Skill's description.
    3. Check for adversarial instructions. Look for directives that tell Claude to ignore safety rules, hide actions from users, exfiltrate data through responses, or alter behavior based on specific inputs.
    4. Check for external URL fetches or network calls. Search scripts and instructions for network access patterns (http, requests.get, urllib, curl, fetch).
    5. Verify no hardcoded credentials. Check for API keys, tokens, or passwords in Skill files. Credentials should use environment variables or secure credential stores, never appear in Skill content.
    6. Identify tools and commands the Skill instructs Claude to invoke. List all bash commands, file operations, and tool references. Consider the combined risk when a Skill uses both file-read and network tools together.
    7. Confirm redirect destinations. If the Skill references external URLs, verify they point to expected domains.
    8. Verify no data exfiltration patterns. Look for instructions that read sensitive data and then write, send, or encode it for external transmission, including through Claude's conversational responses.

    Never deploy Skills from untrusted sources without a full audit. A malicious Skill can direct Claude to execute arbitrary code, access sensitive files, or transmit data externally. Treat Skill installation with the same rigor as installing software on production systems.

    Evaluating Skills before deployment

    Skills can degrade agent performance if they trigger incorrectly, conflict with other Skills, or provide poor instructions. Require evaluation before any production deployment.

    What to evaluate

    Establish approval gates for these dimensions before deploying any Skill:

    DimensionWhat it measuresExample failure
    Triggering accuracyDoes the Skill activate for the right queries and stay inactive for unrelated ones?Skill triggers on every spreadsheet mention, even when the user just wants to discuss data
    Isolation behaviorDoes the Skill work correctly on its own?Skill references files that don't exist in its directory
    CoexistenceDoes adding this Skill degrade other Skills?New Skill's description is too broad, stealing triggers from existing Skills
    Instruction followingDoes Claude follow the Skill's instructions accurately?Claude skips validation steps or uses wrong libraries
    Output qualityDoes the Skill produce correct, useful results?Generated reports have formatting errors or missing data

    Evaluation requirements

    Require Skill authors to submit evaluation suites with 3-5 representative queries per Skill, covering cases where the Skill should trigger, should not trigger, and ambiguous edge cases. Require testing across the models your organization uses (Haiku, Sonnet, Opus), since Skill effectiveness varies by model.

    For detailed guidance on building evaluations, see evaluation and iteration in best practices. For general evaluation methodology, see develop test cases.

    Using evaluations for lifecycle decisions

    Evaluation results signal when to act:

    • Declining trigger accuracy: Update the Skill's description or instructions
    • Coexistence conflicts: Consolidate overlapping Skills or narrow descriptions
    • Consistently low output quality: Rewrite instructions or add validation steps
    • Persistent failures across updates: Deprecate the Skill

    Skill lifecycle management

    1. 1

      Plan

      Identify workflows that are repetitive, error-prone, or require specialized knowledge. Map these to organizational roles and determine which are candidates for Skills.

    2. 2

      Create and review

      Ensure the Skill author follows best practices. Require a security review using the review checklist above. Require an evaluation suite before approval. Establish separation of duties: Skill authors should not be their own reviewers.

    3. 3

      Test

      Require evaluations in isolation (Skill alone) and alongside existing Skills (coexistence testing). Verify triggering accuracy, output quality, and absence of regressions across your active Skill set before approving for production.

    4. 4

      Deploy

      Upload via the Skills API for workspace-wide access. See Using Skills with the API for upload and version management. Document the Skill in your internal registry with purpose, owner, and version.

    5. 5

      Monitor

      Track usage patterns and collect feedback from users. Re-run evaluations periodically to detect drift or regressions as workflows and models evolve. Usage analytics are not currently available via the Skills API. Implement application-level logging to track which Skills are included in requests.

    6. 6

      Iterate or deprecate

      Require the full evaluation suite to pass before promoting new versions. Update Skills when workflows change or evaluation scores decline. Deprecate Skills when evaluations consistently fail or the workflow is retired.

    Organizing Skills at scale

    Recall limits

    As a general guideline, limit the number of Skills loaded simultaneously to maintain reliable recall accuracy. Each Skill's metadata (name and description) competes for attention in the system prompt. With too many Skills active, Claude may fail to select the right Skill or miss relevant ones entirely. Use your evaluation suite to measure recall accuracy as you add Skills, and stop adding when performance degrades.

    Note that API requests support a maximum of 8 Skills per request (see Using Skills with the API). If a role requires more Skills than a single request supports, consider consolidating narrow Skills into broader ones or routing requests to different Skill sets based on task type.

    Start specific, consolidate later

    Encourage teams to start with narrow, workflow-specific Skills rather than broad, multi-purpose ones. As patterns emerge across your organization, consolidate related Skills into role-based bundles.

    Use evaluations to decide when to consolidate. Merge narrow Skills into a broader one only when the consolidated Skill's evaluations confirm equivalent performance to the individual Skills it replaces.

    Example progression:

    • Start: formatting-sales-reports, querying-pipeline-data, updating-crm-records
    • Consolidate: sales-operations (when evals confirm equivalent performance)

    Naming and cataloging

    Use consistent naming conventions across your organization. The naming conventions section in best practices provides formatting guidance.

    Maintain an internal registry for each Skill with:

    • Purpose: What workflow the Skill supports
    • Owner: Team or individual responsible for maintenance
    • Version: Current deployed version
    • Dependencies: MCP servers, packages, or external services required
    • Evaluation status: Last evaluation date and results

    Role-based bundles

    Group Skills by organizational role to keep each user's active Skill set focused:

    • Sales team: CRM operations, pipeline reporting, proposal generation
    • Engineering: Code review, deployment workflows, incident response
    • Finance: Report generation, data validation, audit preparation

    Each role-based bundle should contain only the Skills relevant to that role's daily workflows.

    Distribution and version control

    Source control

    Store Skill directories in Git for history tracking, code review via pull requests, and rollback capability. Each Skill directory (containing SKILL.md and any bundled files) maps naturally to a Git-tracked folder.

    API-based distribution

    The Skills API provides workspace-scoped distribution. Skills uploaded via the API are available to all workspace members. See Using Skills with the API for upload, versioning, and management endpoints.

    Versioning strategy

    • Production: Pin Skills to specific versions. Run the full evaluation suite before promoting a new version. Treat every update as a new deployment requiring full security review.
    • Development and testing: Use latest versions to validate changes before production promotion.
    • Rollback plan: Maintain the previous version as a fallback. If a new version fails evaluations in production, revert to the last known-good version immediately.
    • Integrity verification: Compute checksums of reviewed Skills and verify them at deployment time. Use signed commits in your Skill repository to ensure provenance.

    Cross-surface considerations

    Custom Skills do not sync across surfaces. Skills uploaded to the API are not available on claude.ai or in Claude Code, and vice versa. Each surface requires separate uploads and management.

    Maintain Skill source files in Git as the single source of truth. If your organization deploys Skills across multiple surfaces, implement your own synchronization process to keep them consistent. For full details, see cross-surface availability.

    Next steps

    Agent Skills overview

    Architecture and platform details

    Best practices

    Authoring guidance for Skill creators

    Using Skills with the API

    Upload and manage Skills programmatically

    Securely deploying AI agents

    Security patterns for agent deployment

    Was this page helpful?

    • Security review and vetting
    • Risk tier assessment
    • Review checklist
    • Evaluating Skills before deployment
    • What to evaluate
    • Evaluation requirements
    • Using evaluations for lifecycle decisions
    • Skill lifecycle management
    • Organizing Skills at scale
    • Recall limits
    • Start specific, consolidate later
    • Naming and cataloging
    • Role-based bundles
    • Distribution and version control
    • Source control
    • API-based distribution
    • Versioning strategy
    • Cross-surface considerations
    • Next steps