Loading...
  • Messages
  • Managed Agents
  • Admin
Search...
⌘K
First steps
Intro to ClaudeQuickstart
Building with Claude
Features overviewUsing the Messages APIHandling stop reasons
Model capabilities
Extended thinkingAdaptive thinkingEffortTask budgets (beta)Fast mode (beta: research preview)Structured outputsCitationsStreaming MessagesBatch processingSearch resultsStreaming refusalsMultilingual supportEmbeddings
Tools
OverviewHow tool use worksTutorial: Build a tool-using agentDefine toolsHandle tool callsParallel tool useTool Runner (SDK)Strict tool useTool use with prompt cachingServer toolsTroubleshootingWeb search toolWeb fetch toolCode execution toolAdvisor toolMemory toolBash toolComputer use toolText editor tool
Tool infrastructure
Tool referenceManage tool contextTool combinationsTool searchProgrammatic tool callingFine-grained tool streaming
Context management
Context windowsCompactionContext editingPrompt cachingCache diagnostics (beta)Token counting
Working with files
Files APIPDF supportImages and vision
Skills
OverviewQuickstartBest practicesSkills for enterpriseSkills in the API
MCP
Remote MCP serversMCP connector
Claude on cloud platforms
Amazon BedrockAmazon Bedrock (legacy)Claude Platform on AWSMicrosoft FoundryVertex AI
Log in
Cache diagnostics (beta)
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Solutions

  • AI agents
  • Code modernization
  • Coding
  • Customer support
  • Education
  • Financial services
  • Government
  • Life sciences

Partners

  • Amazon Bedrock
  • Google Cloud's Vertex AI

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Company

  • Anthropic
  • Careers
  • Economic Futures
  • Research
  • News
  • Responsible Scaling Policy
  • Security and compliance
  • Transparency

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Help and security

  • Availability
  • Status
  • Support
  • Discord

Terms and policies

  • Privacy policy
  • Responsible disclosure policy
  • Terms of service: Commercial
  • Terms of service: Consumer
  • Usage policy
Messages/Context management

Cache diagnostics

Diagnose unexpected prompt cache misses by comparing consecutive requests and identifying exactly where the prompt prefix diverged.

This feature qualifies for Zero Data Retention (ZDR) with limited technical retention. See the Data retention section for details on what is retained and why.

Prompt caching cuts latency and cost significantly, but only when the beginning of your prompt is byte-for-byte identical to a recent request. A reordered tool, a timestamp interpolated into your system prompt, or an edit to an earlier message can silently invalidate the cache. Without cache diagnostics, the only signal is usage.cache_read_input_tokens dropping to zero, with no indication of what changed.

Cache diagnostics closes that gap. Pass the id of your previous response, and the API compares the two requests and tells you where they diverged (the model, the system prompt, the tools, or the message history) so you can fix the root cause instead of guessing.

Cache diagnostics is in beta. Include the beta header cache-diagnosis-2026-04-07 in your API requests to use this feature.

Cache diagnostics is currently available on the Claude API only. It is not supported on Amazon Bedrock or Vertex AI.

How cache diagnostics works

When the beta header is present, the API stores a lightweight fingerprint of each request, keyed by the response id. On your next request, include that id as diagnostics.previous_message_id. The API rebuilds the fingerprint for the new request, compares it against the stored one, and attaches a diagnostics object to the response describing the first point of divergence.

The comparison is about request structure, independent of whether the cache actually hit. See Reading diagnostics alongside usage for how to combine the diagnostics result with usage.cache_read_input_tokens.

Fingerprints contain only hashes and token-count estimates (never raw prompt content), are retained for a limited time, are scoped to your organization and workspace, and are not used for any other purpose.

Basic usage

Send the beta header on every turn. On the first turn, pass "previous_message_id": null to opt in without a prior message to compare against. On subsequent turns, pass the id from the previous response.

client = anthropic.Anthropic()

SYSTEM = "You are an AI assistant analyzing a large document. <document>...</document>"

# Turn 1: opt in with previous_message_id=None
r1 = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system=SYSTEM,
    messages=[{"role": "user", "content": "Summarize section 1."}],
    diagnostics={"previous_message_id": None},
    betas=["cache-diagnosis-2026-04-07"],
)

# Turn 2: reference the previous response id
r2 = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system=SYSTEM,
    messages=[
        {"role": "user", "content": "Summarize section 1."},
        {"role": "assistant", "content": r1.content},
        {"role": "user", "content": "Now summarize section 2."},
    ],
    diagnostics={"previous_message_id": r1.id},
    betas=["cache-diagnosis-2026-04-07"],
)

diagnostics = r2.diagnostics
if diagnostics is None:
    print("No divergence detected.")
elif diagnostics.cache_miss_reason is None:
    print("Comparison still pending.")
else:
    print(f"cache_miss_reason: {diagnostics.cache_miss_reason.type}")

Streaming

In streaming responses, diagnostics appears on the message_start event.

# Turn 2: stream, referencing the previous response id
with client.beta.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system=SYSTEM,
    messages=[
        {"role": "user", "content": "Summarize section 1."},
        {"role": "assistant", "content": r1.content},
        {"role": "user", "content": "Now summarize section 2."},
    ],
    diagnostics={"previous_message_id": r1.id},
    betas=["cache-diagnosis-2026-04-07"],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    print()
    r2 = stream.get_final_message()

diagnostics = r2.diagnostics
if diagnostics is None:
    print("No divergence detected.")
elif diagnostics.cache_miss_reason is None:
    print("Comparison still pending.")
else:
    print(f"cache_miss_reason: {diagnostics.cache_miss_reason.type}")

The message_start event carries the full diagnostics field; see Response format for the possible values.

Threading diagnostics through a conversation loop

In a multi-turn conversation, carry the latest response id forward as previous_message_id on every turn. The first iteration passes null to opt in; each subsequent iteration passes the id from the previous response.

Response format

The diagnostics field on the response Message has four possible states:

ValueMeaning
field absentThe request did not include diagnostics, or the beta header was missing.
nullEither previous_message_id was null (first turn, nothing to compare), or a comparison ran and found no divergence.
{"cache_miss_reason": null}The comparison was still running when the response was serialized. This can happen when prefill is very fast. Treat it as inconclusive and check the next turn.
{"cache_miss_reason": {...}}A cache_miss_reason is attached. For *_changed types this identifies the first divergence point; previous_message_not_found and unavailable are cases where no comparison was produced.

When cache_miss_reason is non-null, it looks like this:

{
  "id": "msg_01Xyz...",
  "type": "message",
  "role": "assistant",
  "content": [{ "type": "text", "text": "..." }],
  "usage": {
    "input_tokens": 42,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 41850,
    "output_tokens": 210
  },
  "diagnostics": {
    "cache_miss_reason": {
      "type": "system_changed",
      "cache_missed_input_tokens": 41850
    }
  }
}

Cache miss reason types

cache_miss_reason is a discriminated union on type. The response reports the earliest divergence only, so fix it first; later ones may be hidden behind it.

TypeWhat it meansWhat to change
model_changedThe model differs from the previous request (for example, a router, A/B test, or fallback selected a different model). The cache is per-model.Hold the model constant within a cached conversation.
system_changedThe system parameter differs. Typically a timestamp, request ID, or other per-request value was interpolated into the system prompt.Make the system prompt a byte-stable constant and move dynamic data into the first user message after your cache breakpoint.
tools_changedThe tools array differs: tools were added, removed, or reordered between turns, or tool input_schema JSON was serialized non-deterministically.Send the same tool list on every turn in a fixed order with deterministically serialized schemas (for example, sort keys).
messages_changedThe model, system, and tools all match, but an earlier entry in messages was altered, reordered, or removed rather than appended to. Typically conversation history was truncated or edited, or assistant turns and tool_result blocks were re-serialized differently on resend.Treat the history as append-only; echo assistant content and tool results back verbatim.
previous_message_not_foundNo stored fingerprint exists for the supplied previous_message_id. This is not evidence that your request changed. Typically the previous request did not carry the beta header, it came from a different workspace, or too much time has passed since it was sent.Send the beta header on every turn and keep consecutive turns close together in time.
unavailableDiagnostic information was not available for this request. This includes the case where model, system, and tools match but another prompt-affecting request parameter (tool_choice, thinking, context_management, output_config, output_format, or the set of active anthropic-beta headers) differs, and very long conversations where the divergence is beyond the comparison horizon. Your request was processed normally.Keep the prompt-affecting request parameters constant for the lifetime of a cached conversation. If persistent, apply the manual checks under Troubleshooting common issues on the prompt caching page.

The four *_changed types also carry a cache_missed_input_tokens integer: an estimate of how many input tokens fell after the divergence point, giving you a sense of how much cacheable prefix was lost. It is derived from byte lengths before tokenization, so treat it as a magnitude indicator rather than a billing number. It can differ from (and occasionally exceed) usage.input_tokens.

Reading diagnostics alongside usage

diagnostics answers "did my request change?" while usage.cache_read_input_tokens answers "did the cache hit?". Combining them tells you where to look.

This matrix applies to turns where you passed a real previous_message_id. On the first turn (previous_message_id: null), diagnostics is always null and cache_read_input_tokens is normally zero because the cache is being written, not read; no troubleshooting is needed. The matrix also does not apply when cache_miss_reason is null (the comparison is still pending; check the next turn) or when its type is previous_message_not_found or unavailable (no comparison was produced).

Diagnostics resultCache read tokensInterpretation
nullhighWorking as expected. Your prefix is stable and the cache hit.
nulllow or zeroYour requests match but the cache entry was no longer available. Consider shortening gaps between turns or using the 1-hour cache TTL.
cache_miss_reason is a *_changed typelow or zeroYour bug. The request changed; fix the cause indicated by type.
cache_miss_reason is a *_changed typehighRare. A change occurred late in the prompt but an earlier cache_control breakpoint still hit. Worth fixing, but low impact.

Limitations

  • Beta: Field names and semantics may change before general availability.
  • Claude API only: Not available on Amazon Bedrock or Vertex AI.
  • Limited retention: Fingerprints for previous_message_id lookup expire after a short period. Run diagnostic comparisons between closely spaced requests.
  • Same workspace: The previous request must have been made with an API key from the same organization and workspace.
  • Comparison horizon: For very long conversations where the only change is deep in the message list, the response may be unavailable rather than a precise location.
  • Best-effort: Diagnostics never blocks or fails your request. If diagnostic information is not available, the response returns unavailable, or cache_miss_reason: null when the comparison was still running.

Data retention

Cache diagnostics is ZDR eligible (qualified). Anthropic does not store the raw text of your prompts or Claude's outputs for this feature.

The fingerprint stored for each request consists only of cryptographic hashes and token-count estimates, keyed by the response id and scoped to your organization and workspace. Fingerprints expire after a short period and are not used for any other purpose.

For ZDR eligibility across all features, see API and data retention.

See also

  • Prompt caching
  • Token counting
  • Beta headers

Was this page helpful?

  • How cache diagnostics works
  • Basic usage
  • Streaming
  • Threading diagnostics through a conversation loop
  • Response format
  • Cache miss reason types
  • Reading diagnostics alongside usage
  • Limitations
  • Data retention
  • See also