Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K
    First steps
    Intro to ClaudeQuickstart
    Models & pricing
    Models overviewChoosing a modelWhat's new in Claude 4.6Migration guideModel deprecationsPricing
    Build with Claude
    Features overviewUsing the Messages APIHandling stop reasonsPrompting best practices
    Model capabilities
    Extended thinkingAdaptive thinkingEffortFast mode (beta: research preview)Structured outputsCitationsStreaming MessagesBatch processingPDF supportSearch resultsMultilingual supportEmbeddingsVision
    Tools
    OverviewHow tool use worksTutorial: Build a tool-using agentDefine toolsHandle tool callsParallel tool useTool Runner (SDK)Strict tool useTool use with prompt cachingServer toolsTroubleshootingTool referenceWeb search toolWeb fetch toolCode execution toolMemory toolBash toolComputer use toolText editor tool
    Tool infrastructure
    Manage tool contextTool combinationsTool searchProgrammatic tool callingFine-grained tool streaming
    Context management
    Context windowsCompactionContext editingPrompt cachingToken counting
    Files & assets
    Files API
    Agent Skills
    OverviewQuickstartBest practicesSkills for enterpriseClaude API skillUsing Skills with the API
    Agent SDK
    OverviewQuickstartHow the agent loop works
    Streaming InputStream responses in real-timeConnect MCP serversDefine custom toolsTool searchHandling PermissionsUser approvals and inputControl execution with hooksFile checkpointingStructured outputs in the SDKHosting the Agent SDKSecurely deploying AI agentsModifying system promptsSubagents in the SDKSlash Commands in the SDKAgent Skills in the SDKTrack cost and usageTodo ListsPlugins in the SDK
    MCP in the API
    MCP connectorRemote MCP servers
    Claude on 3rd-party platforms
    Amazon BedrockMicrosoft FoundryVertex AI
    Prompt engineering
    OverviewConsole prompting tools
    Test & evaluate
    Define success and build evaluationsUsing the Evaluation ToolReducing latency
    Strengthen guardrails
    Reduce hallucinationsIncrease output consistencyMitigate jailbreaksStreaming refusalsReduce prompt leak
    Administration and monitoring
    Admin API overviewData residencyWorkspacesUsage and Cost APIClaude Code Analytics APIAPI and data retention
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    Guides

    Track cost and usage

    Learn how to track token usage, deduplicate parallel tool calls, and calculate costs with the Claude Agent SDK.

    The Claude Agent SDK provides detailed token usage information for each interaction with Claude. This guide explains how to properly track costs and understand usage reporting, especially when dealing with parallel tool uses and multi-step conversations.

    For complete API documentation, see the TypeScript SDK reference and Python SDK reference.

    Understand token usage

    The TypeScript and Python SDKs expose usage data at different levels of detail:

    • TypeScript provides per-step token breakdowns on each assistant message (message.message.id, message.message.usage), per-model cost via modelUsage, and a cumulative total on the result message.
    • Python provides the accumulated total on the result message (total_cost_usd and usage dict). Per-step breakdowns are not available on individual assistant messages.

    Both SDKs use the same underlying cost model. The difference is how much granularity each SDK exposes.

    Cost tracking depends on understanding how the SDK scopes usage data:

    • query() call: one invocation of the SDK's query() function. A single call can involve multiple steps (Claude responds, uses tools, gets results, responds again). Each call produces one result message at the end.
    • Step: a single request/response cycle within a query() call. In TypeScript, each step produces assistant messages with token usage.
    • Session: a series of query() calls linked by a session ID (using the resume option). Each query() call within a session reports its own cost independently.

    The following diagram shows the message stream from a single query() call, with token usage reported at each step and the authoritative total at the end:

    Diagram showing a query producing two steps of messages. Step 1 has four assistant messages sharing the same ID and usage (count once), Step 2 has one assistant message with a new ID, and the final result message shows total_cost_usd for billing.
    1. 1

      Each step produces assistant messages

      When Claude responds, it sends one or more assistant messages. In TypeScript, each assistant message contains a nested BetaMessage (accessed via message.message) with an id and a usage object with token counts (input_tokens, output_tokens). When Claude uses multiple tools in one turn, all messages in that turn share the same id, so deduplicate by ID to avoid double-counting. In Python, per-step usage is not available on individual messages.

    2. 2

      The result message provides the authoritative total

      When the query() call completes, the SDK emits a result message with total_cost_usd and cumulative usage. This is available in both TypeScript (SDKResultMessage) and Python (ResultMessage). If you make multiple query() calls (for example, in a multi-turn session), each result only reflects the cost of that individual call. If you only need the total cost, you can ignore the per-step usage and read this single value.

    Get the total cost of a query

    The result message (TypeScript, Python) is the last message in every query() call. It includes total_cost_usd, the cumulative cost across all steps in that call. This works for both success and error results. If you use sessions to make multiple query() calls, each result only reflects the cost of that individual call.

    The following examples iterate over the message stream from a query() call and print the total cost when the result message arrives:

    import { query } from "@anthropic-ai/claude-agent-sdk";
    
    for await (const message of query({ prompt: "Summarize this project" })) {
      if (message.type === "result") {
        console.log(`Total cost: $${message.total_cost_usd}`);
      }
    }

    Track detailed usage in TypeScript

    The TypeScript SDK exposes additional usage granularity that is not available in Python. The Python SDK's AssistantMessage does not expose per-step token usage or per-model breakdowns. Use ResultMessage.usage for cumulative totals instead.

    Track per-step usage

    Each assistant message contains a nested BetaMessage (accessed via message.message) with an id and usage object with token counts. When Claude uses tools in parallel, multiple messages share the same id with identical usage data. Track which IDs you've already counted and skip duplicates to avoid inflated totals.

    Parallel tool calls produce multiple assistant messages whose nested BetaMessage shares the same id and identical usage. Always deduplicate by ID to get accurate per-step token counts.

    The following example accumulates input and output tokens across all steps, counting each unique message ID only once:

    import { query } from "@anthropic-ai/claude-agent-sdk";
    
    const seenIds = new Set<string>();
    let totalInputTokens = 0;
    let totalOutputTokens = 0;
    
    for await (const message of query({ prompt: "Summarize this project" })) {
      if (message.type === "assistant") {
        const msgId = message.message.id;
    
        // Parallel tool calls share the same ID, only count once
        if (!seenIds.has(msgId)) {
          seenIds.add(msgId);
          totalInputTokens += message.message.usage.input_tokens;
          totalOutputTokens += message.message.usage.output_tokens;
        }
      }
    }
    
    console.log(`Steps: ${seenIds.size}`);
    console.log(`Input tokens: ${totalInputTokens}`);
    console.log(`Output tokens: ${totalOutputTokens}`);

    Break down usage per model

    The result message includes modelUsage, a map of model name to per-model token counts and cost. This is useful when you run multiple models (for example, Haiku for subagents and Opus for the main agent) and want to see where tokens are going.

    The following example runs a query and prints the cost and token breakdown for each model used:

    import { query } from "@anthropic-ai/claude-agent-sdk";
    
    for await (const message of query({ prompt: "Summarize this project" })) {
      if (message.type !== "result") continue;
    
      for (const [modelName, usage] of Object.entries(message.modelUsage)) {
        console.log(`${modelName}: $${usage.costUSD.toFixed(4)}`);
        console.log(`  Input tokens: ${usage.inputTokens}`);
        console.log(`  Output tokens: ${usage.outputTokens}`);
        console.log(`  Cache read: ${usage.cacheReadInputTokens}`);
        console.log(`  Cache creation: ${usage.cacheCreationInputTokens}`);
      }
    }

    Accumulate costs across multiple calls

    Each query() call returns its own total_cost_usd. The SDK does not provide a session-level total, so if your application makes multiple query() calls (for example, in a multi-turn session or across different users), accumulate the totals yourself.

    The following examples run two query() calls sequentially, add each call's total_cost_usd to a running total, and print both the per-call and combined cost:

    import { query } from "@anthropic-ai/claude-agent-sdk";
    
    // Track cumulative cost across multiple query() calls
    let totalSpend = 0;
    
    const prompts = [
      "Read the files in src/ and summarize the architecture",
      "List all exported functions in src/auth.ts"
    ];
    
    for (const prompt of prompts) {
      for await (const message of query({ prompt })) {
        if (message.type === "result") {
          totalSpend += message.total_cost_usd ?? 0;
          console.log(`This call: $${message.total_cost_usd}`);
        }
      }
    }
    
    console.log(`Total spend: $${totalSpend.toFixed(4)}`);

    Handle errors, caching, and token discrepancies

    For accurate cost tracking, account for failed conversations, cache token pricing, and occasional reporting inconsistencies.

    Resolve output token discrepancies

    In rare cases, you might observe different output_tokens values for messages with the same ID. When this occurs:

    1. Use the highest value: the final message in a group typically contains the accurate total.
    2. Verify against total cost: the total_cost_usd in the result message is authoritative.
    3. Report inconsistencies: file issues at the Claude Code GitHub repository.

    Track costs on failed conversations

    Both success and error result messages include usage and total_cost_usd. If a conversation fails mid-way, you still consumed tokens up to the point of failure. Always read cost data from the result message regardless of its subtype.

    Track cache tokens

    The Agent SDK automatically uses prompt caching to reduce costs on repeated content. You do not need to configure caching yourself. The usage object includes two additional fields for cache tracking:

    • cache_creation_input_tokens: tokens used to create new cache entries (charged at a higher rate than standard input tokens).
    • cache_read_input_tokens: tokens read from existing cache entries (charged at a reduced rate).

    Track these separately from input_tokens to understand caching savings. In TypeScript, these fields are typed on the Usage object. In Python, they appear as keys in the ResultMessage.usage dict (for example, message.usage.get("cache_read_input_tokens", 0)).

    Related documentation

    • TypeScript SDK Reference - Complete API documentation
    • SDK Overview - Getting started with the SDK
    • SDK Permissions - Managing tool permissions

    Was this page helpful?

    • Understand token usage
    • Get the total cost of a query
    • Track detailed usage in TypeScript
    • Track per-step usage
    • Break down usage per model
    • Accumulate costs across multiple calls
    • Handle errors, caching, and token discrepancies
    • Resolve output token discrepancies
    • Track costs on failed conversations
    • Track cache tokens
    • Related documentation