Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K
    First steps
    Intro to ClaudeQuickstart
    Models & pricing
    Models overviewChoosing a modelWhat's new in Claude 4.6Migration guideModel deprecationsPricing
    Build with Claude
    Features overviewUsing the Messages APIHandling stop reasonsPrompting best practices
    Model capabilities
    Extended thinkingAdaptive thinkingEffortFast mode (beta: research preview)Structured outputsCitationsStreaming MessagesBatch processingPDF supportSearch resultsMultilingual supportEmbeddingsVision
    Tools
    OverviewHow tool use worksTutorial: Build a tool-using agentDefine toolsHandle tool callsParallel tool useTool Runner (SDK)Strict tool useTool use with prompt cachingServer toolsTroubleshootingTool referenceWeb search toolWeb fetch toolCode execution toolMemory toolBash toolComputer use toolText editor tool
    Tool infrastructure
    Manage tool contextTool combinationsTool searchProgrammatic tool callingFine-grained tool streaming
    Context management
    Context windowsCompactionContext editingPrompt cachingToken counting
    Files & assets
    Files API
    Agent Skills
    OverviewQuickstartBest practicesSkills for enterpriseClaude API skillUsing Skills with the API
    Agent SDK
    OverviewQuickstartHow the agent loop works
    MCP in the API
    MCP connectorRemote MCP servers
    Claude on 3rd-party platforms
    Amazon BedrockMicrosoft FoundryVertex AI
    Prompt engineering
    OverviewConsole prompting tools
    Test & evaluate
    Define success and build evaluationsUsing the Evaluation ToolReducing latency
    Strengthen guardrails
    Reduce hallucinationsIncrease output consistencyMitigate jailbreaksStreaming refusalsReduce prompt leak
    Administration and monitoring
    Admin API overviewData residencyWorkspacesUsage and Cost APIClaude Code Analytics APIAPI and data retention
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    Tool infrastructure

    Manage tool context

    Choose between tool search, programmatic tool calling, prompt caching, and context editing to manage context bloat.

    Tool definitions and accumulated tool_result blocks consume your context window. Long-running agents with many tools or many turns can exhaust available context before the task is finished. Four approaches address this at different points in the pipeline.

    The four approaches

    Each approach targets a different source of context pressure. Pick the one that matches where your tokens are going.

    ApproachWhat it reducesWhen it fitsLearn more
    Tool searchTool definitions loaded upfrontLarge toolsets (20+ tools) where most tools aren't needed every turnTool search tool
    Programmatic tool callingtool_result roundtripsChains of tool calls that can execute as a single scriptProgrammatic tool calling
    Prompt cachingToken cost of repeated tool definitionsStable toolsets across many requestsTool use with prompt caching
    Context editingOld tool_result blocks in historyLong conversations where early results are no longer relevantContext editing

    Tool search

    Tool search keeps tool definitions out of the context window until Claude asks for them. Instead of sending 50 tool schemas upfront, you send a single tool_search tool and let Claude discover the rest on demand. This trades a small amount of latency (one extra turn to look up a tool) for a large reduction in baseline context usage.

    Programmatic tool calling

    Programmatic tool calling collapses a sequence of tool calls into a single code block that Claude writes and Anthropic's code execution sandbox runs. Rather than five roundtrips of tool_use and tool_result, Claude emits one script that calls all five functions from within the sandbox. The intermediate results never enter the conversation history.

    Prompt caching

    Prompt caching doesn't reduce the number of tokens in context, but it reduces what you pay for them on subsequent requests. If your tool definitions are stable, cache them once and reuse the cached prefix across thousands of requests. This is the right choice when the toolset is large but fixed.

    Context editing

    Context editing removes old tool_result blocks from the conversation history once they've served their purpose. A long agent loop might produce hundreds of intermediate results that were useful at the time but are now dead weight. Context editing lets you trim them without restarting the conversation.

    Combining approaches

    These approaches compose. A long-running agent might use tool search to keep the toolset lean, prompt caching to amortize the cost of the remaining definitions, and context editing to trim stale results as the conversation grows. Each solves a different part of the problem, so there's no conflict in using them together.

    A reasonable starting point for a high-volume agent:

    1. Enable prompt caching on your tool definitions from day one. Cache writes carry a 25% markup over base input pricing, which pays back on the second request that hits the cache.
    2. Add tool search once your toolset grows past roughly 20 tools or your baseline context usage becomes noticeable.
    3. Add context editing once individual conversations start running long enough that early results become irrelevant.
    4. Consider programmatic tool calling if you notice repetitive chains of small tool calls that could run as a single batch.

    Next steps

    Tool search tool

    Load tool definitions on demand instead of upfront.

    Programmatic tool calling

    Collapse tool-call chains into a single executable script.

    Tool use with prompt caching

    Cache tool definitions across requests to cut token costs.

    Context editing

    Trim stale tool results from long-running conversations.

    Was this page helpful?

    • The four approaches
    • Tool search
    • Programmatic tool calling
    • Prompt caching
    • Context editing
    • Combining approaches
    • Next steps