Loading...
    • Developer Guide
    • API Reference
    • MCP
    • Resources
    • Release Notes
    Search...
    ⌘K
    First steps
    Intro to ClaudeQuickstart
    Models & pricing
    Models overviewChoosing a modelWhat's new in Claude 4.5Migrating to Claude 4.5Model deprecationsPricing
    Build with Claude
    Features overviewUsing the Messages APIContext windowsPrompting best practices
    Capabilities
    Prompt cachingContext editingExtended thinkingEffortStreaming MessagesBatch processingCitationsMultilingual supportToken countingEmbeddingsVisionPDF supportFiles APISearch resultsStructured outputs
    Tools
    OverviewHow to implement tool useFine-grained tool streamingBash toolCode execution toolProgrammatic tool callingComputer use toolText editor toolWeb fetch toolWeb search toolMemory toolTool search tool
    Agent Skills
    OverviewQuickstartBest practicesSkills for enterpriseUsing Skills with the API
    Agent SDK
    OverviewQuickstartTypeScript SDKTypeScript V2 (preview)Python SDKMigration Guide
    Streaming InputStream responses in real-timeHandling PermissionsUser approvals and inputControl execution with hooksSession ManagementFile checkpointingStructured outputs in the SDKHosting the Agent SDKSecurely deploying AI agentsModifying system promptsMCP in the SDKCustom ToolsSubagents in the SDKSlash Commands in the SDKAgent Skills in the SDKTracking Costs and UsageTodo ListsPlugins in the SDK
    MCP in the API
    MCP connectorRemote MCP servers
    Claude on 3rd-party platforms
    Amazon BedrockMicrosoft FoundryVertex AI
    Prompt engineering
    OverviewPrompt generatorUse prompt templatesPrompt improverBe clear and directUse examples (multishot prompting)Let Claude think (CoT)Use XML tagsGive Claude a role (system prompts)Prefill Claude's responseChain complex promptsLong context tipsExtended thinking tips
    Test & evaluate
    Define success criteriaDevelop test casesUsing the Evaluation ToolReducing latency
    Strengthen guardrails
    Reduce hallucinationsIncrease output consistencyMitigate jailbreaksStreaming refusalsReduce prompt leakKeep Claude in character
    Administration and monitoring
    Admin API overviewWorkspacesUsage and Cost APIClaude Code Analytics API
    Console
    Log in
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...
    Loading...

    Solutions

    • AI agents
    • Code modernization
    • Coding
    • Customer support
    • Education
    • Financial services
    • Government
    • Life sciences

    Partners

    • Amazon Bedrock
    • Google Cloud's Vertex AI

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Company

    • Anthropic
    • Careers
    • Economic Futures
    • Research
    • News
    • Responsible Scaling Policy
    • Security and compliance
    • Transparency

    Learn

    • Blog
    • Catalog
    • Courses
    • Use cases
    • Connectors
    • Customer stories
    • Engineering at Anthropic
    • Events
    • Powered by Claude
    • Service partners
    • Startups program

    Help and security

    • Availability
    • Status
    • Support
    • Discord

    Terms and policies

    • Privacy policy
    • Responsible disclosure policy
    • Terms of service: Commercial
    • Terms of service: Consumer
    • Usage policy
    Guides

    Stream responses in real-time

    Get real-time responses from the Agent SDK as text and tool calls stream in

    By default, the Agent SDK yields complete AssistantMessage objects after Claude finishes generating each response. To receive incremental updates as text and tool calls are generated, enable partial message streaming by setting include_partial_messages (Python) or includePartialMessages (TypeScript) to true in your options.

    This page covers output streaming (receiving tokens in real-time). For input modes (how you send messages), see Send messages to agents. You can also stream responses using the Agent SDK via the CLI.

    Enable streaming output

    To enable streaming, set include_partial_messages (Python) or includePartialMessages (TypeScript) to true in your options. This causes the SDK to yield StreamEvent messages containing raw API events as they arrive, in addition to the usual AssistantMessage and ResultMessage.

    Your code then needs to:

    1. Check each message's type to distinguish StreamEvent from other message types
    2. For StreamEvent, extract the event field and check its type
    3. Look for content_block_delta events where delta.type is text_delta, which contain the actual text chunks

    The example below enables streaming and prints text chunks as they arrive. Notice the nested type checks: first for StreamEvent, then for content_block_delta, then for text_delta:

    from claude_agent_sdk import query, ClaudeAgentOptions
    from claude_agent_sdk.types import StreamEvent
    import asyncio
    
    async def stream_response():
        options = ClaudeAgentOptions(
            include_partial_messages=True,
            allowed_tools=["Bash", "Read"],
        )
    
        async for message in query(prompt="List the files in my project", options=options):
            if isinstance(message, StreamEvent):
                event = message.event
                if event.get("type") == "content_block_delta":
                    delta = event.get("delta", {})
                    if delta.get("type") == "text_delta":
                        print(delta.get("text", ""), end="", flush=True)
    
    asyncio.run(stream_response())

    StreamEvent reference

    When partial messages are enabled, you receive raw Claude API streaming events wrapped in an object. The type has different names in each SDK:

    • Python: StreamEvent (import from claude_agent_sdk.types)
    • TypeScript: SDKPartialAssistantMessage with type: 'stream_event'

    Both contain raw Claude API events, not accumulated text. You need to extract and accumulate text deltas yourself. Here's the structure of each type:

    @dataclass
    class StreamEvent:
        uuid: str                      # Unique identifier for this event
        session_id: str                # Session identifier
        event: dict[str, Any]          # The raw Claude API stream event
        parent_tool_use_id: str | None # Parent tool ID if from a subagent

    The event field contains the raw streaming event from the Claude API. Common event types include:

    Event TypeDescription
    message_startStart of a new message
    content_block_startStart of a new content block (text or tool use)
    content_block_deltaIncremental update to content
    content_block_stopEnd of a content block
    message_deltaMessage-level updates (stop reason, usage)
    message_stopEnd of the message

    Message flow

    With partial messages enabled, you receive messages in this order:

    StreamEvent (message_start)
    StreamEvent (content_block_start) - text block
    StreamEvent (content_block_delta) - text chunks...
    StreamEvent (content_block_stop)
    StreamEvent (content_block_start) - tool_use block
    StreamEvent (content_block_delta) - tool input chunks...
    StreamEvent (content_block_stop)
    StreamEvent (message_delta)
    StreamEvent (message_stop)
    AssistantMessage - complete message with all content
    ... tool executes ...
    ... more streaming events for next turn ...
    ResultMessage - final result

    Without partial messages enabled (include_partial_messages in Python, includePartialMessages in TypeScript), you receive all message types except StreamEvent. Common types include SystemMessage (session initialization), AssistantMessage (complete responses), ResultMessage (final result), and CompactBoundaryMessage (indicates when conversation history was compacted).

    Stream text responses

    To display text as it's generated, look for content_block_delta events where delta.type is text_delta. These contain the incremental text chunks. The example below prints each chunk as it arrives:

    from claude_agent_sdk import query, ClaudeAgentOptions
    from claude_agent_sdk.types import StreamEvent
    import asyncio
    
    async def stream_text():
        options = ClaudeAgentOptions(include_partial_messages=True)
    
        async for message in query(prompt="Explain how databases work", options=options):
            if isinstance(message, StreamEvent):
                event = message.event
                if event.get("type") == "content_block_delta":
                    delta = event.get("delta", {})
                    if delta.get("type") == "text_delta":
                        # Print each text chunk as it arrives
                        print(delta.get("text", ""), end="", flush=True)
    
        print()  # Final newline
    
    asyncio.run(stream_text())

    Stream tool calls

    Tool calls also stream incrementally. You can track when tools start, receive their input as it's generated, and see when they complete. The example below tracks the current tool being called and accumulates the JSON input as it streams in. It uses three event types:

    • content_block_start: tool begins
    • content_block_delta with input_json_delta: input chunks arrive
    • content_block_stop: tool call complete
    from claude_agent_sdk import query, ClaudeAgentOptions
    from claude_agent_sdk.types import StreamEvent
    import asyncio
    
    async def stream_tool_calls():
        options = ClaudeAgentOptions(
            include_partial_messages=True,
            allowed_tools=["Read", "Bash"],
        )
    
        # Track the current tool and accumulate its input JSON
        current_tool = None
        tool_input = ""
    
        async for message in query(prompt="Read the README.md file", options=options):
            if isinstance(message, StreamEvent):
                event = message.event
                event_type = event.get("type")
    
                if event_type == "content_block_start":
                    # New tool call is starting
                    content_block = event.get("content_block", {})
                    if content_block.get("type") == "tool_use":
                        current_tool = content_block.get("name")
                        tool_input = ""
                        print(f"Starting tool: {current_tool}")
    
                elif event_type == "content_block_delta":
                    delta = event.get("delta", {})
                    if delta.get("type") == "input_json_delta":
                        # Accumulate JSON input as it streams in
                        chunk = delta.get("partial_json", "")
                        tool_input += chunk
                        print(f"  Input chunk: {chunk}")
    
                elif event_type == "content_block_stop":
                    # Tool call complete - show final input
                    if current_tool:
                        print(f"Tool {current_tool} called with: {tool_input}")
                        current_tool = None
    
    asyncio.run(stream_tool_calls())

    Build a streaming UI

    This example combines text and tool streaming into a cohesive UI. It tracks whether the agent is currently executing a tool (using an in_tool flag) to show status indicators like [Using Read...] while tools run. Text streams normally when not in a tool, and tool completion triggers a "done" message. This pattern is useful for chat interfaces that need to show progress during multi-step agent tasks.

    from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage
    from claude_agent_sdk.types import StreamEvent
    import asyncio
    import sys
    
    async def streaming_ui():
        options = ClaudeAgentOptions(
            include_partial_messages=True,
            allowed_tools=["Read", "Bash", "Grep"],
        )
    
        # Track whether we're currently in a tool call
        in_tool = False
    
        async for message in query(
            prompt="Find all TODO comments in the codebase",
            options=options
        ):
            if isinstance(message, StreamEvent):
                event = message.event
                event_type = event.get("type")
    
                if event_type == "content_block_start":
                    content_block = event.get("content_block", {})
                    if content_block.get("type") == "tool_use":
                        # Tool call is starting - show status indicator
                        tool_name = content_block.get("name")
                        print(f"\n[Using {tool_name}...]", end="", flush=True)
                        in_tool = True
    
                elif event_type == "content_block_delta":
                    delta = event.get("delta", {})
                    # Only stream text when not executing a tool
                    if delta.get("type") == "text_delta" and not in_tool:
                        sys.stdout.write(delta.get("text", ""))
                        sys.stdout.flush()
    
                elif event_type == "content_block_stop":
                    if in_tool:
                        # Tool call finished
                        print(" done", flush=True)
                        in_tool = False
    
            elif isinstance(message, ResultMessage):
                # Agent finished all work
                print(f"\n\n--- Complete ---")
    
    asyncio.run(streaming_ui())

    Known limitations

    Some SDK features are incompatible with streaming:

    • Extended thinking: when you explicitly set max_thinking_tokens (Python) or maxThinkingTokens (TypeScript), StreamEvent messages are not emitted. You'll only receive complete messages after each turn. Note that thinking is disabled by default in the SDK, so streaming works unless you enable it.
    • Structured output: the JSON result appears only in the final ResultMessage.structured_output, not as streaming deltas. See structured outputs for details.

    Next steps

    Now that you can stream text and tool calls in real-time, explore these related topics:

    • Interactive vs one-shot queries: choose between input modes for your use case
    • Structured outputs: get typed JSON responses from the agent
    • Permissions: control which tools the agent can use

    Was this page helpful?

    • Enable streaming output
    • StreamEvent reference
    • Message flow
    • Stream text responses
    • Stream tool calls
    • Build a streaming UI
    • Known limitations
    • Next steps