This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
System instructions normally live in the top-level system field, ahead of every message in the conversation. That position is great for prompt caching: the system prompt is part of the stable prefix, so subsequent turns hit the cache. It is a poor position for instructions you only discover you need partway through a session, because editing the top-level system field changes the very beginning of the prompt and invalidates the cache for everything that follows.
Mid-conversation system messages close that gap. You append a {"role": "system"} message at the point in the conversation where the new instruction becomes relevant, instead of editing the top-level system field. The cached prefix stays the same, so the next request still reads it from cache, and the new instruction is still applied as a system instruction rather than as ordinary user text.
Mid-conversation system messages are available on the Claude API and Claude Platform on AWS. They are not available on Amazon Bedrock, Vertex AI, or Microsoft Foundry.
This feature is available on Claude Opus 4.8 only. No beta header is required.
Prompt caching hashes the request prefix in order: tools, then system, then messages. A cache hit requires the prefix to match a recent request exactly, byte for byte, up to the cache breakpoint.
That ordering means the top-level system field sits near the very start of the hashed prefix. Any change to it, even appending a sentence, produces a different hash, and the request misses the cache for the system prompt and every cached message after it.
Mid-conversation system messages let you add the instruction at the end of the message history instead. Everything before the new instruction is unchanged, so the existing cache entry still matches, and only the new message is processed as fresh input.
A few situations where this matters:
system field would re-process the entire history.In all of these cases, putting the instruction in a regular user message works, but the model treats user content as data to interpret, not as an instruction with system-level priority. A mid-conversation system message preserves the instruction's authority without paying the cache-miss cost.
Add a message with "role": "system" to the messages array. Use a plain string or content blocks for content, the same as a user or assistant turn. The instruction applies from that point in the conversation onward. When instructions conflict, later system messages take precedence over earlier ones, and mid-conversation system messages take precedence over the top-level system field for the turns that follow them.
You can still set the top-level system field for instructions that should apply to the entire conversation. Reserve mid-conversation system messages for instructions that only become relevant later, or that you want to add without invalidating the cached prefix.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system="You are a code review assistant. Be concise.",
messages=[
{
"role": "user",
"content": "Review process() in utils.py for performance issues.",
},
{
"role": "assistant",
"content": "The list comprehension is fine for small inputs. For large inputs, consider a generator to avoid materializing the full list.",
},
{
"role": "user",
"content": "Now review the calling code that invokes process().",
},
# The reviewer realizes mid-session that all suggestions must
# also pass the team's strict typing policy. Appending the
# instruction here keeps earlier turns byte-identical, so a
# cached prefix (if you set one) remains valid.
{
"role": "system",
"content": "From now on, every suggestion must include explicit type annotations.",
},
],
)
print(response.content[0].text)A mid-conversation system message must immediately follow a user message (or an assistant message that ends in a server tool use), and must either be the last entry in messages or be followed by an assistant turn. In practice, append it at the end of the array, after the latest user turn.
Mid-conversation system messages and prompt caching are designed to be used together:
cache_control on the last block that stays the same across requests, whether that is the end of the top-level system field, the end of your tool definitions, or a stable point in the message history.Avoid editing or removing a mid-conversation system message that has already been sent. Like any other change to earlier messages, that invalidates the cache from that point forward. If the instruction needs to evolve, append a new system message rather than rewriting the old one. Consecutive system messages are not allowed; merge instructions into one message or wait for the next user turn before appending.
system message cannot be the first entry in messages. Use the top-level system field for instructions that apply from the very start.system message must immediately follow a user turn (or an assistant turn ending in server tool use) and must precede an assistant turn or end the array. Placing it elsewhere returns a 400 error.How caching works, where to place breakpoints, and how to read cache usage fields.
Find out exactly where two requests diverged when a cache hit you expected does not happen.
Message structure, multi-turn conversations, and the system field.
Writing effective prompts and system instructions.
Was this page helpful?