This guide is designed to give Claude the basics of using the Claude API. It gives explanation and examples of model IDs/the basic messages API, tool use, streaming, extended thinking, and nothing else.
Smartest model: Claude Opus 4.6: claude-opus-4-6
Smart model: Claude Sonnet 4.6: claude-sonnet-4-6
For fast, cost-effective tasks: Claude Haiku 4.5: claude-haiku-4-5-20251001import anthropic
import os
message = anthropic.Anthropic(
api_key=os.environ.get("ANTHROPIC_API_KEY")
).messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude"}],
)
print(message){
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello!"
}
],
"model": "claude-opus-4-6",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 6
}
}The Messages API is stateless, which means that you always send the full conversational history to the API. You can use this pattern to build up a conversation over time. Earlier conversational turns don't necessarily need to actually originate from Claude. You can use synthetic assistant messages.
import anthropic
message = anthropic.Anthropic().messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude"},
{"role": "assistant", "content": "Hello!"},
{"role": "user", "content": "Can you describe LLMs to me?"},
],
)
print(message)You can pre-fill part of Claude's response in the last position of the input messages list. This can be used to shape Claude's response. The example below uses "max_tokens": 1 to get a single multiple choice answer from Claude.
message = anthropic.Anthropic().messages.create(
model="claude-opus-4-6",
max_tokens=1,
messages=[
{
"role": "user",
"content": "What is latin for Ant? (A) Apoidea, (B) Rhopalocera, (C) Formicidae",
},
{"role": "assistant", "content": "The answer is ("},
],
)Claude can read both text and images in requests. Both base64 and url source types are supported for images, along with the image/jpeg, image/png, image/gif, and image/webp media types.
import anthropic
import base64
import httpx
# Option 1: Base64-encoded image
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_media_type = "image/jpeg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
message = anthropic.Anthropic().messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": image_media_type,
"data": image_data,
},
},
{"type": "text", "text": "What is in the above image?"},
],
}
],
)
# Option 2: URL-referenced image
message_from_url = anthropic.Anthropic().messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
},
},
{"type": "text", "text": "What is in the above image?"},
],
}
],
)Extended thinking can sometimes help Claude with very hard tasks. When it's enabled, temperature must be set to 1.
Extended thinking is supported in the following models:
claude-opus-4-1-20250805)claude-opus-4-20250514)claude-sonnet-4-6)claude-sonnet-4-5-20250929)When extended thinking is turned on, Claude creates thinking content blocks where it outputs its internal reasoning. The API response will include thinking content blocks, followed by text content blocks.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?",
}
],
)
# The response will contain summarized thinking blocks and text blocks
for block in response.content:
if block.type == "thinking":
print(f"\nThinking summary: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")The budget_tokens parameter determines the maximum number of tokens Claude is allowed to use for its internal reasoning process. In Claude 4 models, this limit applies to full thinking tokens, and not to the summarized output. Larger budgets can improve response quality by enabling more thorough analysis for complex problems. One rule: the value of max_tokens must be strictly greater than the value of budget_tokens so that Claude has space to write its response after thinking is complete.
Extended thinking can be used alongside tool use, allowing Claude to reason through tool selection and results processing.
Important limitations:
tool_choice: {"type": "auto"} (default) or tool_choice: {"type": "none"}.thinking blocks back to the API for the last assistant message.import anthropic
client = anthropic.Anthropic()
weather_tool = {
"name": "get_weather",
"description": "Get the current weather for a location.",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string", "description": "The city name."}},
"required": ["location"],
},
}
weather_data = {"temperature": 72}
# First request - Claude responds with thinking and tool request
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tools=[weather_tool],
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
)
# Extract thinking block and tool use block
thinking_block = next(
(block for block in response.content if block.type == "thinking"), None
)
tool_use_block = next(
(block for block in response.content if block.type == "tool_use"), None
)
# Second request - Include thinking block and tool result
continuation = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tools=[weather_tool],
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
# Notice that the thinking_block is passed in as well as the tool_use_block
{"role": "assistant", "content": [thinking_block, tool_use_block]},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": f"Current temperature: {weather_data['temperature']}°F",
}
],
},
],
)Extended thinking with tool use in Claude 4 models supports interleaved thinking, which enables Claude to think between tool calls. To enable on Claude 4, 4.5, and Sonnet 4.6 models, add the beta header interleaved-thinking-2025-05-14 to your API request.
import anthropic
client = anthropic.Anthropic()
calculator_tool = {
"name": "calculator",
"description": "Perform arithmetic calculations.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The math expression to evaluate.",
}
},
"required": ["expression"],
},
}
database_tool = {
"name": "database_query",
"description": "Query the product database.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The database query."}
},
"required": ["query"],
},
}
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tools=[calculator_tool, database_tool],
messages=[
{
"role": "user",
"content": "What's the total revenue if we sold 150 units of product A at $50 each?",
}
],
betas=["interleaved-thinking-2025-05-14"],
)With interleaved thinking and ONLY with interleaved thinking (not regular extended thinking), the budget_tokens can exceed the max_tokens parameter, as budget_tokens in this case represents the total budget across all thinking blocks within one assistant turn.
For Claude Opus 4.6, interleaved thinking is automatically enabled when using adaptive thinking (thinking: {type: "adaptive"}). No beta header is needed. Sonnet 4.6 supports both the interleaved-thinking-2025-05-14 beta header with manual extended thinking and adaptive thinking.
Client tools are specified in the tools top-level parameter of the API request. Each tool definition includes:
| Parameter | Description |
|---|---|
name | The name of the tool. Must match the regex ^[a-zA-Z0-9_-]{1,64}$. |
description | A detailed plaintext description of what the tool does, when it should be used, and how it behaves. |
input_schema | A JSON Schema object defining the expected parameters for the tool. |
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature, either 'celsius' or 'fahrenheit'"
}
},
"required": ["location"]
}
}Provide extremely detailed descriptions. This is by far the most important factor in tool performance. Your descriptions should explain every detail about the tool, including:
Consider using input_examples for complex tools. For tools with nested objects, optional parameters, or format-sensitive inputs, you can provide concrete examples using the input_examples field (beta). This helps Claude understand expected input patterns. See Providing tool use examples for details.
Example of a good tool description:
{
"name": "get_stock_price",
"description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the user asks about the current or most recent price of a specific stock. It will not provide any other information about the stock or company.",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
}
},
"required": ["ticker"]
}
}You can force Claude to use a specific tool by specifying the tool in the tool_choice field:
tool_choice = {"type": "tool", "name": "get_weather"}When working with the tool_choice parameter, there are four possible options:
auto allows Claude to decide whether to call any provided tools or not (default).any tells Claude that it must use one of the provided tools.tool forces Claude to always use a particular tool.none prevents Claude from using any tools.Tools do not necessarily need to be client functions. You can use tools anytime you want the model to return JSON output that follows a provided schema.
When using tools, Claude will often show its "chain of thought", i.e. the step-by-step reasoning it uses to break down the problem and decide which tools to use.
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "<thinking>To answer this question, I will: 1. Use the get_weather tool to get the current weather in San Francisco. 2. Use the get_time tool to get the current time in the America/Los_Angeles timezone, which covers San Francisco, CA.</thinking>"
},
{
"type": "tool_use",
"id": "toolu_01A09q90qw90lq917835lq9",
"name": "get_weather",
"input": { "location": "San Francisco, CA" }
}
]
}By default, Claude may use multiple tools to answer a user query. You can disable this behavior by setting disable_parallel_tool_use=true.
The response will have a stop_reason of tool_use and one or more tool_use content blocks that include:
id: A unique identifier for this particular tool use block.name: The name of the tool being used.input: An object containing the input being passed to the tool.When you receive a tool use response, you should:
name, id, and input from the tool_use block.tool_result:{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "15 degrees"
}
]
}max_tokens stop reasonIf Claude's response is cut off due to hitting the max_tokens limit during tool use, retry the request with a higher max_tokens value.
pause_turn stop reasonWhen using server tools like web search, the API may return a pause_turn stop reason. Continue the conversation by passing the paused response back as-is in a subsequent request.
If the tool itself throws an error during execution, return the error message with "is_error": true:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "ConnectionError: the weather service API is not available (HTTP 500)",
"is_error": true
}
]
}If Claude's attempted use of a tool is invalid (e.g. missing required parameters), try the request again with more-detailed description values in your tool definitions.
When creating a Message, you can set "stream": true to incrementally stream the response using server-sent events (SSE).
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
model="claude-opus-4-6",
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Each server-sent event includes a named event type and associated JSON data. Each stream uses the following event flow:
message_start: contains a Message object with empty content.content_block_start, one or more content_block_delta events, and content_block_stop.message_delta events, indicating top-level changes to the final Message object.message_stop event.Warning: The token counts shown in the usage field of the message_delta event are cumulative.
{
"type": "content_block_delta",
"index": 0,
"delta": { "type": "text_delta", "text": "Hello frien" }
}For tool_use content blocks, deltas are partial JSON strings:
{"type": "content_block_delta","index": 1,"delta": {"type": "input_json_delta","partial_json": "{\"location\": \"San Fra"}}}When using extended thinking with streaming:
{
"type": "content_block_delta",
"index": 0,
"delta": {
"type": "thinking_delta",
"thinking": "Let me solve this step by step..."
}
}event: message_start
data: {"type": "message_start", "message": {"id": "msg_1nZdL29xx5MUA1yADyHTEsnR8uuvGzszyY", "type": "message", "role": "assistant", "content": [], "model": "claude-opus-4-6", "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens": 25, "output_tokens": 1}}}
event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "text", "text": ""}}
event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "Hello"}}
event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "!"}}
event: content_block_stop
data: {"type": "content_block_stop", "index": 0}
event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence":null}, "usage": {"output_tokens": 15}}
event: message_stop
data: {"type": "message_stop"}Was this page helpful?