
Code review with agents
The review prompt that catches what humans miss. The escalation rules.
What review catches that humans don’t
Human review catches taste, structure, naming, the obvious correctness bugs. Agent review, with the right prompt, catches the layer underneath: sprint-contract scope violations, memory-invariant conflicts, missing timeouts, hardcoded paths, tests that assert execution instead of behavior, security regressions.
The reason isn’t that agents are smarter. It’s that agents have read the sprint contract, the memory entries, and the related files into the same context window as the diff. A human reviewer rarely loads all that — they’re usually reviewing the diff in isolation. Context-loaded review finds context-dependent bugs.
The other reason: agents don’t get tired. The 12-file, 380-line diff that a human eye glazes over is the diff where bugs hide. The agent reads every line with the same attention. That’s where the marginal catches come from.
Real example from the QC sensor pivot: the radar driver wire-up commit had two blockers (hardcoded UART path, no read timeout) and a sprint-contract violation (missing tests on calibration.py). A human reading 12 files at midnight would have missed at least one. The agent caught all three because it loaded the contract first.
The 6-axis review
Six axes. Each catches a different class of failure. Agents tend to over-index on style if you don’t name the other five.
| Axis | What it catches | What humans usually miss |
|---|---|---|
| 1. Correctness | Off-by-ones, edge cases, error handling | Edge cases that aren’t in the active diff but are in adjacent code that this diff calls. |
| 2. Scope | Files touched outside sprint-contract scope | The “while we’re in there” refactor that wasn’t in the contract. |
| 3. Security | Secrets, injection, auth gaps, privilege escalation | Tokens accidentally committed to logs or config files. |
| 4. Performance | N+1, blocking IO without timeout, unbounded memory | Missing timeouts on external calls (the QC radar bug). |
| 5. Tests | New behaviors without tests; tests that don’t assert behavior | Coverage-theater tests that don’t actually catch the bug they pretend to. |
| 6. Taste | Naming, structure, comments, future-maintainer cost | Style is what humans see. Structure is what humans miss. |
Plus two cross-cutting checks: scope vs sprint contract and memory invariant check. The memory invariant check is the differentiator — it’s what catches diffs that violate rules you previously codified but forgot.
Five failure patterns
The rubber-stamp review
The line-by-line nitpick
The review with no escalation
The review without context
The review-by-the-author
Pattern #5 (review-by-the-author) is the most under-recognized. Solo operators run into it constantly because there’s no other reviewer. The fresh-session rule is the workaround: spawn a new agent that hasn’t seen the implementation conversation, give it the diff and the context, get the review.
The review prompt
Six axes, structured output, escalation tiers. Save in your CLAUDE.md or as a slash command.
Review this diff. Active sprint contract: <path>.
Related memory entries: <paths>.
Run the 6-axis review:
1. CORRECTNESS — does the code do what it claims? Edge cases?
Off-by-ones?
2. SCOPE — are all changed files in the sprint contract scope?
Flag any out-of-scope file.
3. SECURITY — secrets in commits, injection vectors, auth gaps,
privilege boundaries.
4. PERFORMANCE — obvious perf issues (N+1, blocking IO without
timeout, unbounded memory).
5. TESTS — does the diff ship with tests for new behaviors? Do
the tests assert behavior, not execution?
6. TASTE — naming, structure, comments, future-maintainer cost.
Output format:
- BLOCKERS (merge-blocking): numbered list with file:line
- CONCERNS (needs response before merge): numbered list
- NITS (optional): numbered list
- Out-of-scope check: pass/fail
- Memory invariant check: any rules in memory that this diff
conflicts with?
Be specific. "Looks good" is not a review. There are always three
things worth flagging — find them.Catches scope creep, memory-invariant violations, and the long-diff bugs human review misses.
The “there are always three things worth flagging — find them” framing matters. Without it, agents default to approval. With it, they look harder.
Real QC commit walkthrough
The XM125 radar driver wire-up commit on QC. 12 files, 380 lines, three real BLOCKERS / CONCERNS. The review catches what a tired human would have missed.
Two blockers (hardcoded path, no timeout) + one sprint-contract violation (missing tests). All three caught by the prompt.
- operator
I just landed a commit on QC for the sensor pivot — wired the new XM125 radar driver path under
src/qc/sensors/radar/. Diff is ~12 files, 380 lines. Run a review before I integrate with the bench-test rig. - agent
Loading: sprint contract
memory/sprint-contracts/active.md, sensor strategy doc04-Quantum-Caddy/Hardware/SENSOR-STRATEGY-RESET-2026-04-22.md, and the diff. Running 6-axis review.## Code Review — XM125 Radar Driver Wire-Up Commit: <hash> · Files: 12 · Lines: +380 / -42 ### BLOCKERS (merge-blocking) 1. **Hardcoded UART path `/dev/ttyUSB0` in `radar_driver.py:27`.** Bench rig may have multiple USB devices; ttyUSB0 is not deterministic across boots. Pull from config. 2. **No timeout on radar.read().** If the XM125 fails to respond, the read blocks indefinitely. The CV pipeline at :8642 will hang. Add a 100ms timeout with a structured error. ### CONCERNS (needs response before merge) 3. **`src/qc/sensors/radar/calibration.py` lacks tests.** Sprint contract Done Means row "sensor module changes ship with passing tests" not satisfied. Two new public functions, zero new tests in the diff. 4. **The `RadarFrame.merge()` method silently drops out-of-order frames.** This is probably right (radar is temporally ordered), but no comment explains why. Future maintainer will assume it's a bug and add reordering logic. ### NITS (optional) 5. **Inconsistent naming:** `radar_driver.py` vs `tof_sensor.py` (existing). Pick one — _driver or _sensor. 6. **Magic number `0x4147` in radar protocol header — name it `XM125_FRAME_MAGIC`. ### Out-of-scope check All 12 files are under `src/qc/sensors/radar/` or `tests/sensors/radar/`. Sprint contract scope: bench-test wire-up. ✅ ### Memory invariant check From `memory/feedback-qc-sensor-constraints.md`: "Smart board must ship as self-contained consumer product." The radar driver assumes USB. For the consumer product path, USB is fine (boards ship with the radar pre-attached). For the venue-mount path it'd be a problem. Flagging for awareness, not blocking. - operator
Fix the two blockers. Skip the NITs. For the test gap (concern #3), draft the missing tests.
- agent
Two blockers fixed in commit <hash+1>:
-radar_driver.py:27now reads UART path fromconfig.sensor.radar.port. Default/dev/ttyUSB0preserved as fallback with deprecation warning.
-radar.read()now takestimeout_ms: int = 100. ReturnsRadarReadTimeoutexception on expiry. Added testtest_radar_read_times_out_cleanly.Drafting tests for
calibration.py:# tests/sensors/radar/test_calibration.py def test_zero_point_calibration_returns_offset(): cal = RadarCalibration() cal.record_zero(distance_m=0.5) assert cal.offset_m == 0.5 def test_calibration_clears_on_reset(): cal = RadarCalibration() cal.record_zero(distance_m=0.5) cal.reset() assert cal.offset_m == 0.0Four tests total. Running: 4/4 green. Concern #3 is now closed. Concern #4 is documentation — adding the comment explaining out-of-order drop. NITs left for the next cleanup pass.
Ready to merge per the merge-etiquette rule (agent prepares, human lands).
- operator
Quick second pass on the timeout fix. Did the agent introduce any test that asserts the timeout fires correctly under load?
- agent
Yes —
test_radar_read_times_out_cleanly:def test_radar_read_times_out_cleanly(): """100ms timeout fires when XM125 doesn't respond.""" driver = RadarDriver(port="/dev/null", timeout_ms=100) start = time.monotonic() with pytest.raises(RadarReadTimeout): driver.read() elapsed = time.monotonic() - start assert 0.09 < elapsed < 0.15, "Timeout fired outside expected window"Two assertions:
1. The exception fires (not a hang or a different exception)
2. The timeout fires at the right time (within 50% tolerance of 100ms)The timing assertion is what catches a class of bugs where the timeout is implemented but doesn't actually limit the wait.
Prompt kit
Three prompts: the 6-axis review, the author-bias check, the pre-merge final gate. Save in your CLAUDE.md.
Review this diff. Active sprint contract: <path>.
Related memory entries: <paths>.
Run the 6-axis review:
1. CORRECTNESS — does the code do what it claims? Edge cases?
Off-by-ones?
2. SCOPE — are all changed files in the sprint contract scope?
Flag any out-of-scope file.
3. SECURITY — secrets in commits, injection vectors, auth gaps,
privilege boundaries.
4. PERFORMANCE — obvious perf issues (N+1, blocking IO without
timeout, unbounded memory).
5. TESTS — does the diff ship with tests for new behaviors? Do
the tests assert behavior, not execution?
6. TASTE — naming, structure, comments, future-maintainer cost.
Output format:
- BLOCKERS (merge-blocking): numbered list with file:line
- CONCERNS (needs response before merge): numbered list
- NITS (optional): numbered list
- Out-of-scope check: pass/fail
- Memory invariant check: any rules in memory that this diff
conflicts with?
Be specific. "Looks good" is not a review. There are always three
things worth flagging — find them.The agent that wrote this code also reviewed it. Run a
fresh-eyes review with these biases corrected:
- Re-read the diff WITHOUT the implementation context
- Look for "what would I be confused by if I came to this in 6
months without the conversation history"
- Look for things the original agent might have justified but
that don't actually hold up
Output: any new BLOCKERS or CONCERNS the original review missed.
If the original review was sufficient, say so explicitly with
reasoning.Final pre-merge gate. Confirm:
1. All BLOCKERS from review are resolved (cite the fix commit)
2. All CONCERNS have a documented response (resolved or
deferred with tracking issue)
3. Test suite green on the merge target after rebase
4. No new force-push, no history rewrite, no irreversible action
5. Merge note drafted (one paragraph: what shipped, what's
deferred, known limitations)
Output: GO / NO-GO. If NO-GO, name what's missing.
Do not run the merge. Output the report. I'll run the merge.Apply this
40-minute exercise. Run a real review. Calibrate against your historical review pattern.
Add agent review to your merge path
Each step takes 5–10 minutes. Progress saves automatically.
- 01Pick a recent commit (yours or an agent's). Run the 6-axis review prompt against it.Even on shipped code, the review will surface debt. The point is to calibrate what the prompt finds.
- 02Compare the review output to what you reviewed for at the time.Most operators reviewed for correctness and style. The 6-axis prompt usually catches scope and memory-invariant violations the human review missed.
- 03Add a CLAUDE.md line: "All commits to src/ get a 6-axis review by a fresh agent session before merge."The fresh-session rule is what gets you the second pair of eyes when you're solo.
- 04Set up a pre-merge hook that runs the review prompt automatically.The hook can be as simple as a script that opens a fresh agent session, pastes the diff, and saves the review output to a PR comment or local file.
- 05Establish escalation rules: BLOCKERS block merge, CONCERNS get a documented response, NITS are optional.Write these rules in CLAUDE.md so the agent loads them on every review session. The structure is what makes the review actionable.