Claude Platform Docs
  • Messages
  • Managed Agents
  • Admin

Search...
⌘K
First steps
Intro to ClaudeQuickstart
Building with Claude
Features overviewUsing the Messages APIStop reasons and fallbackRefusals and fallbackFallback credit
Model capabilities
Extended thinkingAdaptive thinkingEffortTask budgets (beta)Fast mode (research preview)Structured outputsCitationsStreaming MessagesBatch processingSearch resultsStreaming refusalsMultilingual supportEmbeddings
Tools
OverviewHow tool use worksTutorial: Build a tool-using agentDefine toolsHandle tool callsParallel tool useTool Runner (SDK)Strict tool useServer toolsWeb search toolWeb fetch toolCode execution toolAdvisor toolTool search toolMemory toolBash toolText editor toolComputer use toolTroubleshooting
Tool infrastructure
Tool referenceManage tool contextTool combinationsTool use with prompt cachingProgrammatic tool callingFine-grained tool streaming
Context management
Context windowsCompactionContext editingPrompt cachingMid-conversation system messagesBuild an orchestration modeCache diagnostics (beta)Token counting
Working with files
Files APIPDF support
Skills
OverviewQuickstartBest practicesSkills for enterpriseSkills in the API
MCP
Remote MCP serversMCP connector
Claude on cloud platforms
Amazon BedrockAmazon Bedrock (legacy)Claude Platform on AWSGoogle CloudMicrosoft Foundry

Log in
Tool use with prompt caching
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Claude Platform Docs

Solutions

  • AI agents
  • Code modernization
  • Coding
  • Customer support
  • Education
  • Financial services
  • Government
  • Life sciences

Partners

  • Claude on AWS
  • Claude on Google Cloud

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Company

  • Anthropic
  • Careers
  • Economic Futures
  • Research
  • News
  • Responsible Scaling Policy
  • Security and compliance
  • Transparency

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Help and security

  • Availability
  • Status
  • Support
  • Discord

Terms and policies

  • Privacy policy
  • Responsible disclosure policy
  • Terms of service: Commercial
  • Terms of service: Consumer
  • Usage policy
Messages/Tool infrastructure

Tool use with prompt caching

Cache tool definitions across turns and understand what invalidates your cache.

This page covers prompt caching for tool definitions: where to place cache_control breakpoints, how defer_loading preserves your cache, and what invalidates it. For general prompt caching, see Prompt caching.

cache_control on tool definitions

Place cache_control: {"type": "ephemeral"} on the last tool in your tools array. This caches the entire tool-definitions prefix, from the first tool through the marked breakpoint:

{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather in a given location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    },
    {
      "name": "get_time",
      "description": "Get the current time in a given time zone",
      "input_schema": {
        "type": "object",
        "properties": {
          "timezone": { "type": "string" }
        },
        "required": ["timezone"]
      },
      "cache_control": { "type": "ephemeral" }
    }
  ]
}

For mcp_toolset, the cache_control breakpoint lands on the last tool in the set. You don't control tool order within an MCP toolset, so place the breakpoint on the mcp_toolset entry itself and the API applies it to the final expanded tool.

defer_loading and cache preservation

Deferred tools are not included in the system-prompt prefix. When the model discovers a deferred tool through tool search, the definition is appended inline as a tool_reference block in the conversation history. The prefix is untouched, so prompt caching is preserved.

This means adding tools dynamically through tool search does not break your cache. You can start a conversation with a small set of always-loaded tools (cached), let the model discover additional tools as needed, and keep the same cache hit across every turn.

defer_loading also acts independently of grammar construction for strict mode. The grammar builds from the full toolset regardless of which tools are deferred, so prompt caching and grammar caching are both preserved when tools load dynamically.

What invalidates your cache

The cache follows a prefix hierarchy (tools → system → messages), so a change at one level invalidates that level and everything after it:

ChangeInvalidates
Modifying tool definitionsEntire cache (tools, system, messages)
Toggling web search or citationsSystem and messages caches
Changing tool_choiceMessages cache
Changing disable_parallel_tool_useMessages cache
Toggling images present/absentMessages cache
Changing thinking parametersMessages cache


If you need to vary tool_choice mid-conversation, consider placing cache breakpoints before the variation point.

Server tool results are cached automatically

When your request has prompt caching enabled and Claude uses a server tool such as web search, web fetch, or code execution, the API automatically places a cache breakpoint on the server tool result before running the next iteration of the agentic loop. This lets later iterations within the same request read the growing prefix from cache instead of reprocessing it.

This automatic breakpoint always uses the default 5-minute TTL, independent of any TTL you set on your own cache_control markers. In the response usage, these writes appear under cache_creation.ephemeral_5m_input_tokens, so you may see 5-minute cache writes even when every cache_control you set uses a 1-hour TTL.

This behavior only applies when your request already has at least one cache_control marker. Requests without prompt caching do not receive the automatic breakpoint.

Per-tool interaction table

ToolCaching considerations
Web searchEnabling or disabling invalidates the system and messages caches
Web fetchEnabling or disabling invalidates the system and messages caches
Code executionContainer state is independent of prompt cache
Tool searchDiscovered tools load as tool_reference blocks, preserving prefix cache
Computer useScreenshot presence affects messages cache
Text editorStandard client tool, no special caching interaction
BashStandard client tool, no special caching interaction
MemoryStandard client tool, no special caching interaction

Next steps

Prompt caching

Learn the full prompt caching model, including TTLs and pricing.

Tool search

Load tools on demand without breaking your cache.


Tool reference

Browse all available tools and their parameters.

Was this page helpful?

  • cache_control on tool definitions
  • defer_loading and cache preservation
  • What invalidates your cache
  • Server tool results are cached automatically
  • Per-tool interaction table
  • Next steps