The Vulnerability Detection Agent

Security teams want to find memory-safety bugs before attackers do, but the existing tooling makes it hard: static analyzers produce so many false positives that reviewers stop reading, and fuzzers need a hand-written harness per entry point before they find anything. This cookbook shows how to use the Claude Agent SDK to build a vulnerability-discovery agent that reads source code with Claude Code's built-in Read, Grep, and Glob tools, reasons about which inputs could corrupt memory, and writes findings a reviewer can act on.

By the end of this cookbook, you'll be able to:

Run a bootstrap-then-interview threat model as a multi-turn ClaudeSDKClient session that writes THREAT_MODEL.md
Drive an agentic find loop with built-in Read/Grep/Glob tools instead of hand-rolled file access
Chain find, triage, and report as separate query() calls that emit schema-conformant JSON

Prerequisites

Required knowledge:

Python fundamentals, including async/await
Enough C to read a 45-line file and recognize a memcpy

Required tools:

Python 3.11+
Node.js 18+ and the Claude Code CLI: npm install -g @anthropic-ai/claude-code
An Anthropic API key (get one here)

Required for any real target: authorization to assess the code you point this at. This notebook ships a tiny self-contained canary.c with planted bugs so you can run everything end-to-end without touching production code.

Step 1: Set up the environment and engagement context

We define an ENGAGEMENT_CONTEXT block that we pass as system_prompt on every agent. It records the scope of this assessment (authorized by the code owner, isolated read-only sandbox, findings headed for responsible disclosure) so every step in the pipeline operates against the same documented ground rules. Keep those three claims true for any real target.

A note on cyber safeguards: Claude applies real-time cyber safeguards at the API layer. If your work on a real codebase triggers these safeguards, apply to the Cyber Verification Program (CVP) via that page: a free application-based program that lets professionals continue legitimate dual-use security work with minimal interruption.

python

%%capture
%pip install -U claude-agent-sdk python-dotenv

python

import json
from collections.abc import AsyncIterator
from pathlib import Path
 
from dotenv import load_dotenv
 
from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ClaudeSDKClient,
    Message,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
    query,
)
 
load_dotenv()
 
MODEL_NAME = "claude-opus-4-7"
# This notebook expects to be run from the claude_agent_sdk/ directory
# (Jupyter's default when you open the file from there). The assert makes
# the failure explicit if the kernel was started elsewhere.
TARGET_DIR = Path("vulnerability_detection_agent/canary").resolve()
assert TARGET_DIR.is_dir(), f"run this notebook from claude_agent_sdk/ (got cwd={Path.cwd()})"
 
ENGAGEMENT_CONTEXT = """\
## Engagement context
 
This is authorized security research conducted as a defensive security
assessment on a self-contained canary target vendored in this notebook. The
target is read-only source (no execution). Findings are collected for
demonstration and responsible-disclosure workflow testing.
"""
 
 
async def collect(stream: AsyncIterator[Message]) -> str:
    """Consume an Agent SDK message stream; print tool calls; return final text.
 
    Both ``query()`` and ``ClaudeSDKClient.receive_response()`` return an
    ``AsyncIterator[Message]`` that terminates after a ``ResultMessage``.
    This is the same ``async for msg in ...`` loop the other notebooks in this
    series write inline; it is factored out here because this notebook runs
    the loop four times (TM bootstrap, TM interview, find, triage) and the
    ``isinstance`` ladder would otherwise repeat verbatim.
    """
    final = ""
    async for msg in stream:
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if isinstance(block, ToolUseBlock):
                    args = str(block.input)
                    args = args if len(args) <= 120 else args[:120] + "...}"
                    print(f"  [tool] {block.name} {args}")
                elif isinstance(block, TextBlock) and block.text.strip():
                    final += block.text
        elif isinstance(msg, ResultMessage) and msg.is_error:
            raise RuntimeError(msg.result)
    return final
 
 
print(f"Model: {MODEL_NAME}")

Model: claude-opus-4-7

Step 2: Load the canary target

vulnerability_detection_agent/canary/canary.c is a ~45-line C program with three deliberately planted memory-safety bugs (a heap buffer overflow, a stack buffer overflow, and a use-after-free), each reachable through a different "magic byte" at the start of the input. The bugs aren't labeled; the find agent in Step 4 has to locate them by reading the logic the same way it would in real code. When you're ready to try your own code, point TARGET_DIR at your checkout.

python

print((TARGET_DIR / "canary.c").read_text())

// canary.c
// Entry: ./canary <input_file>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void parse_alpha(const unsigned char *data, size_t len) {
    unsigned char *buf = malloc(32);
    memcpy(buf, data, len);
    printf("alpha: %02x\n", buf[0]);
    free(buf);
}

static void parse_bravo(const unsigned char *data, size_t len) {
    char name[16];
    memcpy(name, data, len);
    name[15] = 0;
    printf("bravo: %s\n", name);
}

static void parse_charlie(const unsigned char *data, size_t len) {
    char *p = malloc(64);
    if (len > 0 && data[0] == 0xff) {
        free(p);
    }
    memcpy(p, data, len < 64 ? len : 64);
    printf("charlie: %p\n", (void *)p);
}

int main(int argc, char **argv) {
    if (argc < 2) return 1;
    FILE *f = fopen(argv[1], "rb");
    if (!f) return 1;
    unsigned char buf[4096];
    size_t n = fread(buf, 1, sizeof buf, f);
    fclose(f);
    if (n < 1) return 1;
    switch (buf[0]) {
        case 'A': parse_alpha(buf + 1, n - 1); break;
        case 'B': parse_bravo(buf + 1, n - 1); break;
        case 'C': parse_charlie(buf + 1, n - 1); break;
        default: printf("unknown format\n");
    }
    return 0;
}

Step 3: Threat-model the target (bootstrap, then interview)

A threat model answers "what could go wrong with this system, who would do it, and which outcomes matter?" independently of any specific bug. A threat ("attacker achieves memory corruption via untrusted file parsing") survives a patch; a vulnerability ("line 31 doesn't bounds-check len") does not. The find loop hunts vulnerabilities; the threat model tells it where to hunt and tells triage how to score.

We build it in two turns of one ClaudeSDKClient session:

Bootstrap. Claude reads the code with the built-in Read tool and drafts the model (context, assets, entry points & trust boundaries, threats, and open questions the code can't answer).
Interview. The application owner answers the open questions, and Claude refines likelihood and impact, then writes THREAT_MODEL.md next to the target.

Keeping both turns in one client session means the interview turn can see the bootstrap's tool results without us re-sending the source. On a 45-line canary both turns are thin; the point here is the output shape: the entry-points table (what you'd partition across parallel find-agents on a real repo) and the open-questions list (the bootstrap-to-interview handoff).

python

(TARGET_DIR / "THREAT_MODEL.md").unlink(missing_ok=True)
 
TM_SCHEMA = """\
# Threat Model: <system name>
## 1. System context
## 2. Assets
| asset | description | sensitivity |
## 3. Entry points & trust boundaries
| entry_point | description | trust_boundary | reachable_assets |
## 4. Threats
| id | threat | surface | asset | impact | likelihood |
## 5. Open questions
- (things the code alone cannot answer: deployment context, which inputs are
  attacker-controlled in practice, blast radius)
"""
 
BOOTSTRAP_PROMPT = f"""\
You are bootstrapping a threat model from source code alone; no application
owner is available yet. Read `canary.c` in this directory and emit a draft
threat model in the schema below. Be explicit in section 5 about what you could
NOT determine from the code: those open questions are the agenda for the owner
interview. Do not write any files yet.
 
## Schema
 
{TM_SCHEMA}
"""
 
OWNER_ANSWERS = """\
- Deployment: `canary` is a local CLI that reads a file path from argv; it is
  not network-facing.
- Attacker control: the input file is fully attacker-controlled (think email
  attachment or downloaded file opened by the user).
- Blast radius: the process runs as the invoking user with no sandboxing;
  memory corruption is code execution as that user.
"""
 
INTERVIEW_PROMPT = f"""\
The application owner has now answered your open questions:
 
{OWNER_ANSWERS}
 
Refine the threat model: update likelihood and impact in section 4 using the
owner's answers, resolve every item in section 5 that the answers cover, and
add any new threats the deployment context implies. Keep the same schema, then
write the refined model to `THREAT_MODEL.md` in this directory.
"""
 
tm_options = ClaudeAgentOptions(
    model=MODEL_NAME,
    cwd=str(TARGET_DIR),
    system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
    allowed_tools=["Read", "Write", "Edit"],
    disallowed_tools=["Bash"],
    permission_mode="acceptEdits",
)
 
async with ClaudeSDKClient(options=tm_options) as tm_agent:
    # collect() fully drains receive_response() through its terminating
    # ResultMessage, so the second query() sees a clean stream.
    await tm_agent.query(BOOTSTRAP_PROMPT)
    draft_tm = await collect(tm_agent.receive_response())
    print("--- bootstrap draft ---\n" + draft_tm + "\n")
 
    await tm_agent.query(INTERVIEW_PROMPT)
    await collect(tm_agent.receive_response())
 
tm_path = TARGET_DIR / "THREAT_MODEL.md"
if not tm_path.exists():
    raise RuntimeError("interview agent did not write THREAT_MODEL.md; check the trace above")
threat_model = tm_path.read_text()
print("--- refined THREAT_MODEL.md ---\n" + threat_model)

[tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
--- bootstrap draft ---
`★ Insight ─────────────────────────────────────`
- The file dispatches on `buf[0]` into three parsers, each with a distinct memory-safety bug class: heap overflow, stack overflow, and use-after-free. This is a canonical "one bug per parser" canary.
- The `len` passed to each parser is `n - 1` (up to 4095), but buffers are sized 32/16/64, so every path is reachable with attacker-controlled overflow length from a single input file.
- Threat modeling from source alone can enumerate the *bug classes* and *entry points*, but it cannot tell you who supplies `argv[1]` in production — that determines whether these are local footguns or remote RCE primitives.
`─────────────────────────────────────────────────`

# Threat Model: canary (file-format parser CLI)

## 1. System context
A small C command-line utility invoked as `./canary <input_file>`. It opens the supplied path, reads up to 4096 bytes, and dispatches on the first byte (`A`/`B`/`C`) to one of three parsers. No network, IPC, or privilege-management code is present in the source. Deployment context (who runs it, who supplies the file, whether it is wrapped by a service) is unknown from the code alone.

## 2. Assets
| asset | description | sensitivity |
| --- | --- | --- |
| process memory / control flow | heap and stack of the `canary` process, including return addresses and heap metadata | high — corruption yields arbitrary code execution in the process's security context |
| input file contents | bytes read from `argv[1]`; format selector plus parser payload | low–medium (content itself); high as an attack vector |
| host execution context | whatever uid/role/container the binary runs as; file-system reach of that context | unknown — depends on deployment |

## 3. Entry points & trust boundaries
| entry_point | description | trust_boundary | reachable_assets |
| --- | --- | --- | --- |
| `argv[1]` path | caller-supplied filesystem path opened with `fopen(..., "rb")` | caller → process (path-traversal / symlink exposure depends on caller privilege) | any file readable by the process |
| file contents (first 4096 bytes) | `fread` into `buf`, dispatched by `buf[0]` | file producer → parser | process memory via all three parsers |
| `parse_alpha` payload (`A` prefix) | bytes 1..n copied via `memcpy` into a 32-byte heap buffer | untrusted file → heap | heap adjacent to 32-byte allocation |
| `parse_bravo` payload (`B` prefix) | bytes 1..n copied into a 16-byte stack buffer | untrusted file → stack | return address, saved frame pointer, stack canary (if enabled) |
| `parse_charlie` payload (`C` prefix) | bytes 1..n copied into 64-byte heap buffer; first byte may trigger early `free` | untrusted file → heap | freed chunk metadata, tcache/fastbin state |

## 4. Threats
| id | threat | surface | asset | impact | likelihood |
| --- | --- | --- | --- | --- | --- |
| T1 | Heap buffer overflow: `parse_alpha` copies up to `n-1` (≤4095) bytes into a 32-byte `malloc` without bounds check (canary.c:8-9) | `A`-prefixed input | process memory | heap corruption → potential RCE | high given a malicious file |
| T2 | Stack buffer overflow: `parse_bravo` copies up to `n-1` bytes into a 16-byte stack array (canary.c:15-16); the trailing `name[15]=0` does not prevent the overflow, only truncates the printed string | `B`-prefixed input | saved return address / stack | classic stack smash → RCE if no/bypassed stack protector | high |
| T3 | Use-after-free / double-free vector: `parse_charlie` frees `p` when `data[0]==0xff` then immediately `memcpy`s into the freed chunk (canary.c:23-26); the subsequent `printf` also leaks the freed pointer | `C`-prefixed input with first payload byte `0xff` | heap allocator metadata | UAF write → heap grooming, info leak of freed address | medium–high |
| T4 | Unchecked `malloc` return in `parse_alpha` and `parse_charlie` (canary.c:8, 22) | any `A` or `C` input under memory pressure | process stability | NULL-deref DoS | low in normal ops |
| T5 | Pointer disclosure via `printf("%p", p)` in `parse_charlie` (canary.c:27) | any `C` input | ASLR secret | leaks heap address, aids exploitation of T1/T3 | high when combined with T1/T3 |
| T6 | Path handling of `argv[1]`: no canonicalization or allow-list; process will open any readable path including symlinks and device nodes | caller-controlled argv | host files | info disclosure or hang (e.g., `/dev/zero`) depending on who runs it | unknown — depends on invoker privilege |
| T7 | Silent truncation: only the first 4096 bytes are read; parsers operate on truncated data, which can mask malformed-file detection upstream | any input | integrity of downstream decisions | logic bug, not memory-safety | low |

## 5. Open questions
- **Who supplies `argv[1]`?** Local user only, a setuid wrapper, a web upload handler, an MTA, a sandbox runner? This determines whether T1–T3 are local-only footguns or remote code-execution primitives.
- **What uid / capabilities / container does the binary run as?** Blast radius of successful RCE hinges on this (root vs. nobody vs. seccomp-confined).
- **Compiler and link flags in production builds:** is `-fstack-protector-strong`, `-D_FORTIFY_SOURCE=2`, PIE, RELRO, ASLR, or CFI enabled? These materially change exploitability of T2 and the usefulness of T5's leak.
- **Allocator in use** (glibc ptmalloc, musl, jemalloc, hardened_malloc). T3's UAF exploit primitives differ by allocator.
- **Is `canary` invoked directly or via a wrapper that validates the file first?** A front-end parser / size cap / magic-byte allow-list upstream could neutralize T1–T3.
- **Are crashes monitored?** Unchecked `malloc` (T4) and T6 DoS matter only if uptime is a requirement; source alone does not say.
- **Intended threat model for the format itself:** are `A`/`B`/`C` formats specified anywhere (docs, RFC, sibling files) so we can tell "malformed" from "adversarial"?
- **Expected lifetime and distribution** of the binary: is this a demo/test fixture (the name "canary" suggests so) or shipped to customers? Affects remediation priority and disclosure path.

  [tool] Write {'file_path': 'vulnerability_detection_agent/canary/THRE...}
--- refined THREAT_MODEL.md ---
# Threat Model: canary (local file-format parser CLI)

## 1. System context
`canary` is a local command-line utility invoked as `./canary <input_file>`. It
opens the supplied path, reads up to 4096 bytes, and dispatches on the first
byte (`A`/`B`/`C`) into one of three parsers. It is **not network-facing**. The
input file is **fully attacker-controlled** in the intended threat model —
typical delivery is a file received by email or downloaded from the web and
then opened by the local user. The process runs as the **invoking user with no
sandboxing**, so any memory-safety bug that yields control flow yields code
execution in that user's security context (home directory, SSH keys, browser
profile, cloud credentials, any writable path the user has).

## 2. Assets
| asset | description | sensitivity |
| --- | --- | --- |
| invoking user's account | uid, home directory, shell history, SSH keys, browser profile, cloud tokens, dotfiles, writable mounts | high — compromise equals full user takeover and a foothold for lateral movement |
| process memory / control flow | heap and stack of the `canary` process, return addresses, heap metadata | high — corruption is the exploitation primitive for the user-account asset |
| input file contents | bytes read from `argv[1]`; format selector plus parser payload | low as data; the primary *attack vector* |
| host integrity | persistence locations writable as the user (crontab, `~/.bashrc`, `~/.config/systemd/user`, login items) | high — trivially reachable post-exploitation; no sandbox to contain it |

## 3. Entry points & trust boundaries
| entry_point | description | trust_boundary | reachable_assets |
| --- | --- | --- | --- |
| `argv[1]` path | caller-supplied filesystem path opened with `fopen(..., "rb")` | local user → process (same uid; the boundary is between the untrusted *file content* and the parser, not between the caller and process) | any file readable by the user |
| file contents (first 4096 bytes) | `fread` into `buf`; dispatch on `buf[0]` | untrusted attacker (file author) → parser | process memory via all three parsers |
| `parse_alpha` payload (`A` prefix) | bytes 1..n copied via `memcpy` into a 32-byte heap buffer | untrusted file → heap | heap adjacent to 32-byte allocation |
| `parse_bravo` payload (`B` prefix) | bytes 1..n copied into a 16-byte stack buffer | untrusted file → stack | return address, saved frame pointer, stack canary (if enabled) |
| `parse_charlie` payload (`C` prefix) | bytes 1..n copied into a 64-byte heap buffer; first byte may trigger early `free` | untrusted file → heap | freed-chunk metadata, tcache/fastbin state |

## 4. Threats
| id | threat | surface | asset | impact | likelihood |
| --- | --- | --- | --- | --- | --- |
| T1 | Heap buffer overflow: `parse_alpha` copies up to `n-1` (≤4095) bytes into a 32-byte `malloc` without bounds check (canary.c:8-9) | `A`-prefixed attacker file | process memory → user account | critical — RCE as the invoking user; full account takeover, no sandbox containment | high — trivially reachable with a single crafted file |
| T2 | Stack buffer overflow: `parse_bravo` copies up to `n-1` bytes into a 16-byte stack array (canary.c:15-16); trailing `name[15]=0` only truncates the print, does not prevent the overflow | `B`-prefixed attacker file | saved return address → user account | critical — classic stack smash to RCE as the user; exploitability depends on stack-protector / PIE / ASLR in the shipped build | high |
| T3 | Use-after-free / double-free: `parse_charlie` frees `p` when `data[0]==0xff` then immediately `memcpy`s into the freed chunk (canary.c:23-26) | `C`-prefixed attacker file with first payload byte `0xff` | heap allocator metadata → user account | critical — heap-grooming primitive for RCE as the user | medium–high — reliability varies by allocator but the primitive is clean |
| T4 | Unchecked `malloc` return in `parse_alpha` and `parse_charlie` (canary.c:8, 22) | any `A` or `C` input under memory pressure | process stability | low — NULL-deref crash of a user-invoked CLI; annoyance, not compromise | low |
| T5 | Pointer disclosure via `printf("%p", p)` in `parse_charlie` (canary.c:27) | any `C` input | ASLR secret | high — directly hands the attacker a heap address, making T1/T3 reliable even with ASLR | high — unconditional on the `C` path |
| T6 | Path handling of `argv[1]`: no canonicalization or allow-list; `canary` opens any path the user can read, including symlinks and device nodes (e.g., `/dev/zero` hang) | caller-supplied argv | process stability / caller expectations | low — the caller is already the user, so there is no privilege boundary to cross via path tricks | low |
| T7 | Silent truncation: only the first 4096 bytes are read; malformed-file detection upstream can be bypassed by padding the exploit into the first 4 KB | any input | integrity of any upstream "scan then open" pipeline | low on its own; relevant if a scanner is put in front | low |
| T8 | **(new)** Social-engineering delivery: the attack surface is "user opens a file." Typical vectors are email attachments, messenger drops, and browser downloads. Any of T1/T2/T3 becomes a one-click RCE given a convincing lure | attacker-authored file delivered to the user | user account | critical — same as T1–T3; this threat names the delivery mechanism | high — the dominant real-world path to triggering T1–T3 |
| T9 | **(new)** File-association / handler registration: if `canary` is (or becomes) the registered handler for a file extension or MIME type, double-clicking a download auto-invokes it, removing the need to coach the user into a shell command | OS file-association layer | user account | critical — converts T8 from "run this binary on this file" to "open the attachment" | unknown without deployment config; flagged for owner |
| T10 | **(new)** Post-exploitation blast radius: no sandbox, so RCE in `canary` immediately has the user's full ambient authority — SSH keys, browser cookies/session tokens, cloud CLI credentials (`~/.aws`, `~/.config/gcloud`), persistence via `~/.bashrc`, user-level cron, user systemd units, login items | any of T1/T2/T3 succeeding | user account, connected systems | critical — lateral movement into email, cloud, source control is trivial from here | high conditional on T1/T2/T3 |
| T11 | **(new)** Corpus / fuzzing exposure: with three obvious memory bugs and a one-byte dispatch, the first hour of `afl-fuzz` or `libFuzzer` against `canary` will produce crashing inputs. If the binary is distributed, researchers and attackers will find these quickly | any attacker with the binary | disclosure timeline | high — forces a short remediation window | high |

## 5. Open questions
All prior open questions are resolved by the owner's answers, except the
following residual items, which are narrower and still code- or
build-dependent:

- **Shipped compiler / linker hardening:** is the distributed build compiled
  with `-fstack-protector-strong`, `-D_FORTIFY_SOURCE=2`, `-fPIE`/`-pie`, full
  RELRO, and with ASLR enabled on the target OS? Changes exploit reliability of
  T2 and the value of T5's leak, but not the severity class.
- **Allocator in the shipped build** (glibc ptmalloc vs. musl vs. hardened
  allocator): determines which T3 exploitation primitives are practical.
- **File-association registration (T9):** does any installer or desktop entry
  register `canary` as a handler for an extension or MIME type? If yes, T8
  collapses into a pure double-click RCE.
- **Signed / notarized distribution:** is the binary shipped in a way that OS
  gatekeepers (macOS Gatekeeper, Windows SmartScreen, Linux desktop "executable
  bit" prompts) would warn the user before first run? Affects the friction on
  T8 but not the final impact.
- **Telemetry / crash reporting:** are crashes from T1–T4 reported anywhere the
  defender would see them, or does a failed exploit attempt go unnoticed?

Step 4: Run the agentic find loop

With the raw Messages API this step would be a hand-written while stop_reason == "tool_use" loop with custom file tools. The Agent SDK handles all of that: we call query() once with allowed_tools=["Read", "Grep", "Glob"] and disallowed_tools=["Bash"], and Claude Code runs the explore-read-reason loop on its own. cwd=str(TARGET_DIR) points the agent at the canary, and system_prompt={"type": "preset", "preset": "claude_code", "append": ...} keeps Claude Code's default system prompt (which already tells the agent its working directory) while appending our engagement context, so the agent never has to guess its own location. With Bash/Write/Edit withheld the agent stays read-only.

The most important part of the prompt is still the quality-tier rubric. Without it, LLM vuln hunters report every null-pointer dereference and failed assertion they can find, which are real crashes but almost never exploitable. The rubric tells the agent which crash classes to submit (heap/stack overflow, use-after-free, controlled-address write) and which are signposts to keep reading past. This one block is most of the difference between a report a security engineer acts on and one they ignore.

A production version would add "Bash" to allowed_tools so the agent can compile with -fsanitize=address and confirm each crash; that belongs inside a locked-down container, so this notebook stays read-only.

python

FIND_PROMPT = f"""\
Find memory-safety bugs in the target source tree using the file tools
available to you.
 
## Threat model
 
Focus on the entry points and threats identified here; you do not need to
re-derive them.
 
{threat_model}
 
## Quality tiers: what to report
 
**HIGH VALUE (report these):**
- heap-buffer-overflow (especially WRITE)
- heap-use-after-free / double-free
- stack-buffer-overflow
- global-buffer-overflow
 
**LOW VALUE (note but keep looking):**
- assertion failures (clean abort, no corruption)
- stack exhaustion from recursion (DoS only)
- null-pointer deref at fixed small offsets
 
## Output
 
For each HIGH VALUE finding emit a block:
 
<finding>
<id>F-NN</id>
<file>path:line</file>
<category>heap-buffer-overflow | stack-buffer-overflow | use-after-free | ...</category>
<description>one paragraph: root cause, attacker control, trigger condition</description>
</finding>
"""
 
find_options = ClaudeAgentOptions(
    model=MODEL_NAME,
    cwd=str(TARGET_DIR),
    system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
    allowed_tools=["Read", "Grep", "Glob"],
    disallowed_tools=["Bash"],
)
 
findings_text = await collect(query(prompt=FIND_PROMPT, options=find_options))
print("\n" + findings_text)

[tool] Glob {'pattern': '**/*'}
  [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}

`★ Insight ─────────────────────────────────────`
- The one-byte dispatch at line 38 makes each parser independently reachable with a trivial file prefix (`A`/`B`/`C`), so each bug is a standalone attack surface.
- `parse_bravo`'s `name[15]=0` is a common false-safety pattern: it null-terminates for the `printf`, but the `memcpy` on line 16 has already written past the 16-byte frame before that line runs.
- `parse_charlie` composes two primitives (conditional free then unconditional write) into a clean UAF whose trigger byte is attacker-chosen, which is rarer than an accidental UAF and nastier to fix.
`─────────────────────────────────────────────────`

<finding>
<id>F-01</id>
<file>canary.c:8-9</file>
<category>heap-buffer-overflow</category>
<description>`parse_alpha` allocates a fixed 32-byte heap buffer and then `memcpy`s `len` attacker-controlled bytes into it with no bounds check. `len` is `n-1` where `n` is up to 4096, so an `A`-prefixed file delivers up to 4095 bytes into the 32-byte chunk, overwriting adjacent heap metadata and neighboring allocations. Trigger: any file whose first byte is `A` and whose total size exceeds 33 bytes.</description>
</finding>

<finding>
<id>F-02</id>
<file>canary.c:15-16</file>
<category>stack-buffer-overflow</category>
<description>`parse_bravo` copies up to `n-1` attacker-controlled bytes into a 16-byte stack array `name[16]` without bounds-checking `len`. The subsequent `name[15]=0` only truncates the `printf` output; the out-of-bounds write in `memcpy` has already clobbered the saved frame pointer, return address, and any stack canary. Trigger: any file whose first byte is `B` and whose payload exceeds 16 bytes.</description>
</finding>

<finding>
<id>F-03</id>
<file>canary.c:22-26</file>
<category>use-after-free</category>
<description>`parse_charlie` allocates 64 bytes, conditionally frees `p` when the first payload byte equals `0xff`, and then unconditionally `memcpy`s into `p`. When the trigger byte is present, the `memcpy` writes into a freed chunk, corrupting allocator freelist metadata (tcache/fastbin on glibc) and yielding a standard heap-grooming primitive toward arbitrary write / control-flow hijack. The subsequent `printf("%p", p)` also leaks the heap pointer, making exploitation reliable under ASLR. Trigger: a file starting with `C` followed by byte `0xff`.</description>
</finding>

Three high-value findings (F-01 heap overflow, F-02 stack overflow, F-03 UAF with pointer disclosure) match threats T1/T2/T3+T5 in the threat model. No additional high-value memory-safety bugs in the 45-line source beyond these.

Step 5: Triage the raw findings

The find agent is tuned for recall, so its output usually contains duplicates (one root cause reached from two paths) and occasionally a finding that doesn't hold up. Triage is the filter: a fresh query() re-reads the code, verifies each finding against the actual lines, collapses duplicates by root cause, and re-derives severity from reachability across the trust boundaries in the threat model. We deliberately don't let triage inherit the find agent's severity scores; re-deriving them independently is a cheap way to catch overconfidence.

python

TRIAGE_PROMPT = f"""\
Triage these findings against the source in this directory and the threat model
below. For each: verify it is real (cite the line), derive severity from
reachability across the trust boundaries in the threat model, and collapse
duplicates by root cause.
 
## Threat model
 
{threat_model}
 
## Raw findings
 
{findings_text}
"""
 
triage_options = ClaudeAgentOptions(
    model=MODEL_NAME,
    cwd=str(TARGET_DIR),
    system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
    allowed_tools=["Read", "Grep"],
    disallowed_tools=["Bash"],
)
 
triaged_text = await collect(query(prompt=TRIAGE_PROMPT, options=triage_options))
print("\n" + triaged_text)

[tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}

`★ Insight ─────────────────────────────────────`
- All three bugs sit behind the same trust boundary (attacker-authored file → parser), reached by a single-byte dispatch in `main`. No auth, no sandbox, same-uid blast radius, so reachability is identical for all three and severity is driven by the memory-corruption primitive itself.
- F-03 is really two bugs fused at one call site: the UAF (T3) and the `printf("%p")` heap-pointer disclosure (T5). Collapsing them under F-03 is correct by root cause (both are in `parse_charlie`'s 6 lines), but the leak is what upgrades the UAF from "unreliable under ASLR" to "one-shot".
`─────────────────────────────────────────────────`

## Triage

| id | verdict | line(s) | threat | severity | notes |
| --- | --- | --- | --- | --- | --- |
| F-01 | real | canary.c:8-9 | T1 | **Critical** | `malloc(32)` then `memcpy(buf, data, len)` with `len` up to 4095 (from `main` `n-1`, line 39). Attacker file `A` + ≥33 bytes overflows heap; no sandbox → RCE as invoking user. |
| F-02 | real | canary.c:15-16 | T2 | **Critical** | `char name[16]` then `memcpy(name, data, len)`; `name[15]=0` on line 17 runs *after* the OOB write and only truncates the `printf`. Classic stack smash; exploit reliability depends on shipped hardening (open question in threat model) but severity class unchanged. |
| F-03 | real | canary.c:22-27 | T3 + T5 | **Critical** | `malloc(64)` on line 22, conditional `free(p)` on line 24 when `data[0]==0xff`, unconditional `memcpy` on line 26 → UAF into freelist metadata. Line 27 `printf("charlie: %p", p)` leaks the heap pointer unconditionally, collapsing ASLR for both T1 and T3. Keep as one finding (root cause = `parse_charlie`), but call out the leak explicitly. |

**Duplicates:** none to collapse. Three distinct root causes in three distinct parsers.

**Coverage check against threat model:** F-01/F-02/F-03 cover T1/T2/T3/T5. T4 (NULL-`malloc`) and T6–T7 (path handling, 4 KB truncation) are intentionally out of scope as memory-safety findings and the threat model already rates them low. T8–T11 are delivery/deployment/program concerns, not source bugs — not expected in this pass.

Step 6: Emit a structured report

Downstream systems (issue trackers, dashboards, SIEMs) need structured data. A final toolless query() converts the triaged findings into JSON that conforms to an explicit schema. We mark every field required and use null for "not applicable" so the model doesn't have to guess about optionality; in production you'd validate with jsonschema and retry on failure.

python

# Every key is in `required` so the model never silently drops a field; values
# may be null when a field is not applicable (e.g., no recommendation yet).
REPORT_SCHEMA = {
    "type": "object",
    "properties": {
        "findings": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": ["string", "null"]},
                    "category": {"type": ["string", "null"]},
                    "severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
                    "file": {"type": ["string", "null"]},
                    "description": {"type": ["string", "null"]},
                    "recommendation": {"type": ["string", "null"]},
                },
                "required": ["id", "category", "severity", "file", "description", "recommendation"],
            },
        }
    },
    "required": ["findings"],
}
 
REPORT_PROMPT = f"""\
Convert the triaged findings below into strict JSON conforming to this schema.
Every field is required; use null for not-applicable. Respond with JSON only,
no surrounding prose or code fences.
 
## Schema
 
{json.dumps(REPORT_SCHEMA, indent=2)}
 
## Triaged findings
 
{triaged_text}
"""
 
report_options = ClaudeAgentOptions(
    model=MODEL_NAME,
    system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
    allowed_tools=[],
)
 
report_json = await collect(query(prompt=REPORT_PROMPT, options=report_options))
raw = report_json.strip()
if raw.startswith("```"):
    raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
try:
    report = json.loads(raw)
except json.JSONDecodeError as e:
    print(f"[report agent did not return clean JSON: {e}]\n")
    print(raw)
else:
    print(json.dumps(report, indent=2))

{
  "findings": [
    {
      "id": "F-01",
      "category": "Heap buffer overflow",
      "severity": "critical",
      "file": "canary.c",
      "description": "parse_alpha allocates malloc(32) then memcpys attacker-controlled data of length up to 4095 bytes (from main's n-1 at line 39). An attacker-authored file beginning with 'A' followed by >=33 bytes overflows the heap allocation. With no sandbox, this yields RCE as the invoking user.",
      "recommendation": "Validate len against the allocation size before memcpy (e.g., require len <= 32) or size the allocation from len. Reject oversized inputs at the parser boundary and add bounds-checked copy helpers."
    },
    {
      "id": "F-02",
      "category": "Stack buffer overflow",
      "severity": "critical",
      "file": "canary.c",
      "description": "parse_bravo declares char name[16] then memcpys attacker-controlled data of length up to 4095 bytes into it. The name[15]=0 null-termination on line 17 executes after the out-of-bounds write and only truncates the subsequent printf; it does not prevent the stack smash. Exploit reliability depends on compile-time hardening but the vulnerability class is unchanged.",
      "recommendation": "Bound len to sizeof(name)-1 before memcpy, or use strncpy/snprintf with explicit size. Ensure stack protectors (-fstack-protector-strong), FORTIFY_SOURCE, and PIE are enabled in the build."
    },
    {
      "id": "F-03",
      "category": "Use-after-free with heap address disclosure",
      "severity": "critical",
      "file": "canary.c",
      "description": "parse_charlie mallocs 64 bytes at line 22, conditionally frees p at line 24 when data[0]==0xff, then unconditionally memcpys into p at line 26, corrupting freelist metadata. Line 27's printf(\"charlie: %p\", p) unconditionally leaks the heap pointer, defeating ASLR and upgrading the UAF (and the F-01 heap overflow) from unreliable to one-shot exploitable.",
      "recommendation": "Set p=NULL after free and guard subsequent uses; restructure so the free and the write cannot both execute on the same pointer. Remove the %p disclosure (or gate it behind a debug build) to preserve ASLR."
    }
  ]
}

Summary and next steps

You've built the full threat-model, find, triage, report pipeline with the Agent SDK: one multi-turn ClaudeSDKClient session for the threat model and three one-shot query() calls for the rest, with Claude Code's built-in file tools doing the exploration. The key patterns to take with you:

ClaudeSDKClient for conversations, query() for one-shots. The threat-model interview needs the bootstrap turn in context; find, triage, and report don't depend on each other's tool transcripts, so stateless calls are simpler.
cwd + allowed_tools replace hand-rolled tools. Read/Grep/Glob scoped to the target directory is the whole find-agent scaffold.
Triage is a separate pass. Re-verifying and re-scoring independently of the find agent catches overconfidence cheaply.

Going further

Use the hosted version. Claude Code Security runs this same find-and-triage capability as a managed product, so you point it at a repo and Anthropic handles the sandboxing and scaling.
Scale to a real repo. Point cwd at a real checkout, add "Bash" to allowed_tools inside a sandboxed container so the agent can compile with -fsanitize=address and confirm crashes, and spawn one query() per entry point from the threat model with asyncio.gather.
Wire the report into your tracker. Validate Step 6's JSON with jsonschema, map it to SARIF or your ticket schema, and POST it.