• Messages
  • Managed Agents
  • Admin
Search...
⌘K
First steps
Intro to ClaudeQuickstart
Building with Claude
Features overviewUsing the Messages APIStop reasons and fallbackRefusals and fallbackFallback credit
Model capabilities
Extended thinkingAdaptive thinkingEffortTask budgets (beta)Fast mode (research preview)Structured outputsCitationsStreaming MessagesBatch processingSearch resultsStreaming refusalsMultilingual supportEmbeddings
Tools
OverviewHow tool use worksTutorial: Build a tool-using agentDefine toolsHandle tool callsParallel tool useTool Runner (SDK)Strict tool useTool use with prompt cachingServer toolsTroubleshootingWeb search toolWeb fetch toolCode execution toolAdvisor toolMemory toolBash toolComputer use toolText editor tool
Tool infrastructure
Tool referenceManage tool contextTool combinationsTool searchProgrammatic tool callingFine-grained tool streaming
Context management
Context windowsCompactionContext editingPrompt cachingMid-conversation system messagesBuild an orchestration modeCache diagnostics (beta)Token counting
Working with files
Files APIPDF supportImages and vision
Skills
OverviewQuickstartBest practicesSkills for enterpriseSkills in the API
MCP
Remote MCP serversMCP connector
Claude on cloud platforms
Amazon BedrockAmazon Bedrock (legacy)Claude Platform on AWSMicrosoft FoundryVertex AI
Log in
Fallback credit
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Solutions

  • AI agents
  • Code modernization
  • Coding
  • Customer support
  • Education
  • Financial services
  • Government
  • Life sciences

Partners

  • Amazon Bedrock
  • Google Cloud's Vertex AI

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Company

  • Anthropic
  • Careers
  • Economic Futures
  • Research
  • News
  • Responsible Scaling Policy
  • Security and compliance
  • Transparency

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Help and security

  • Availability
  • Status
  • Support
  • Discord

Terms and policies

  • Privacy policy
  • Responsible disclosure policy
  • Terms of service: Commercial
  • Terms of service: Consumer
  • Usage policy
Messages/Building with Claude

Fallback credit

Avoid paying the prompt-cache cost twice when you retry a refused Claude Fable 5 request on another model.

Was this page helpful?

  • The basic flow
  • Example
  • Where it works
  • Checking that the credit applied
  • When a retry is rejected
  • Reference
  • Next steps

Prompt caches are per-model. When Claude Fable 5 declines a request and you retry on another model, the conversation prefix that was already cached for Claude Fable 5 must be written into the new model's cache from scratch, and cache writes cost more than cache reads. Fallback credit removes that extra cost: the refusal carries a credit token, you echo the token on the retry, and the retry is billed as though the conversation had been on the new model all along.

You need this page only when you build the retry yourself: on the Ruby or PHP SDK, over raw HTTP, or with custom retry logic. Server-side fallback and the SDK middleware apply fallback credit automatically. If you use either, skip this page.

Refusals and fallback covers detecting refusals and choosing a fallback approach. Prompt caching explains cache reads and cache writes if those terms are new.

The basic flow

  1. 1

    Opt in with the beta header

    Send the request that may be refused with the anthropic-beta: fallback-credit-2026-06-01 header. The server-side-fallback-2026-06-01 header also grants the same fields.

  2. 2

    Read two fields from the refusal

    On a refusal, stop_details includes fallback_credit_token, an opaque string that represents the credit, and fallback_has_prefill_claim, a Boolean that tells you which retry body shape to use. Both are null when no credit is available for the refusal.

  3. 3

    Build the retry

    Start from the refused request body. Set model to the fallback model and add the token as the top-level fallback_credit_token parameter. Pick the body shape from the table below.

  4. 4

    Send the retry with the same header

    Send the retry with the same fallback-credit-2026-06-01 beta header. The retry needs the header to redeem the token.

The fallback_has_prefill_claim field tells you whether the retry can continue the refused model's partial output instead of starting over:

fallback_has_prefill_claimRetry body
trueThe refused request body, unchanged, plus one appended assistant message whose content echoes the refused response's content. The retry model continues the response from where the refused model stopped, and completed server tool calls are not re-executed.
falseThe refused request body, unchanged.

Example

The example below makes a request that may be refused, redeems the credit token on a retry against Claude Opus 4.8, and degrades through the rejection ladder covered in When a retry is rejected.

Where it works

Fallback credit is in beta on the Claude API, Claude Platform on AWS, Amazon Bedrock, Vertex AI, and Microsoft Foundry. Credit tokens returned in Message Batches results cannot be redeemed; redemption applies only to direct Messages API requests.

The retry model must be one of the refused model's permitted fallback targets. At launch, Claude Fable 5's permitted target is Claude Opus 4.8 (claude-opus-4-8).

Checking that the credit applied

The refund is visible in the retry's usage: cache_creation_input_tokens is lower, and cache_read_input_tokens is higher by the same amount, than the same request would report without the token. A shift of zero means the token was honored but there was nothing to reprice, for example because the retry model's cache was already warm.

When a retry is rejected

Most retries redeem on the first attempt. When one does not, the API returns a 400 error that tells you what to try next.

  1. 1

    Continuation rejected: resend the unchanged body

    If the retry that appends the assistant message is rejected with a 400 error, resend the refused request body unchanged, still with the token.

  2. 2

    Token rejected: drop the token

    If the unchanged body is also rejected with a 400 error whose message names fallback_credit_token, retry without the token. The credit is forfeited, but the retry itself goes through.

If the refused request executed server tools, a tokenless retry re-runs and re-bills those tools. In that case, surface the 400 error to your caller instead of falling through to a tokenless retry.

Reference

The sections below cover edge cases and the complete redemption rules. Most integrations do not need them.

Next steps

Refusals and fallback

Detect refusals and choose between server-side fallback, the SDK middleware, and a manual retry.

Prompt caching

How cache reads and cache writes are billed.

client = Anthropic()

request = {
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude"}],
}


def send(model, body):
    return client.beta.messages.create(
        model=model, betas=["fallback-credit-2026-06-01"], **body
    )


response = send("claude-fable-5", request)

if (
    response.stop_reason == "refusal"
    and (details := response.stop_details)
    and (token := details.fallback_credit_token)
):
    exact_body = request | {"fallback_credit_token": token}
    # Prefer the continuation shape unless the claim is False
    if details.fallback_has_prefill_claim is not False:
        # Echo the refusal's content, stripping trailing whitespace from a
        # final text block (the prefill validator rejects it; the server-side
        # match tolerates the edit). Tool-using requests also omit unpaired
        # tool_use blocks, then re-strip whitespace after any omissions.
        echoed = [block.model_dump() for block in response.content]
        match echoed:
            case [*_, {"type": "text"} as final_block]:
                final_block["text"] = final_block["text"].rstrip()
        attempt = exact_body | {
            "messages": [
                *request["messages"],
                {"role": "assistant", "content": echoed},
            ]
        }
    else:
        attempt = exact_body

    try:
        response = send("claude-opus-4-8", attempt)
    except BadRequestError as error:
        if "redemption temporarily unavailable" in str(error):
            raise  # Transient: retry with the token within its five-minute window
        try:
            # Fall back to the unchanged body, still with the token
            response = send("claude-opus-4-8", exact_body)
        except BadRequestError as error:
            if "redemption temporarily unavailable" in str(error):
                raise  # Transient: retry with the token within its five-minute window
            # The token itself was rejected: forfeit it and retry without.
            response = send("claude-opus-4-8", request)

print(json.dumps({"stop_reason": response.stop_reason, "model": response.model}))
Stop reasons and fallback

Every stop_reason value and how to handle it.

SDK middleware

The SDK helper that applies fallback credit automatically.