
Background agents — when, why, never
Long jobs offloaded right; the failure modes nobody warns about.
What background agents are
A background agent is a Claude Code task you launch and walk away from. Instead of watching the agent work turn-by-turn, you authorize a plan, the agent runs to completion (or failure), and your foreground session is free to do other work in parallel.
The lever is real. A 6-hour Parley training run, a multi-hour repo scrape, a long QC CV evaluation pass — none of these need a human in the loop minute-by-minute. Backgrounding turns sequential operator-bound time into parallel agent-runs, which is how a single operator gets the throughput of a small team.
The failure modes are also real, and undersold. Background does not mean unattended. A job without a notification path, a watchdog, and an exit hook is a job that delays bad news, not one that frees your time. The QC project burned $90.66 in wasted GPU spend before the discipline got installed.
This lesson is about installing the three-piece discipline (notify / watchdog / exit hook) and learning when not to background — because there are categories of work where backgrounding is straight-up dangerous.
When to background, when never to
The decision rule:
| Background | Foreground | Never background |
|---|---|---|
| Long deterministic work — training runs, batch scrapes, multi-file refactors with clear scope | Anything where you’ll iterate within 30 seconds of seeing output | Tasks with side effects you can’t cheaply undo (push, deploy, send-email, money-move) |
| Runs >15 min where your input isn’t the bottleneck | Plan-mode sessions, design exploration, research conversations | Tasks where the agent will hit a fork it can’t resolve without you |
| Parallel work alongside foreground — “build this while I work on that” | Anything you’d describe as “exploring” | Tasks on metered infra without a watchdog |
Examples from the portfolio:
- Background OK: Parley Notebook 02’s 7-architecture training run — bounded scope, deterministic, results land in a file, no side effects.
- Background OK: A multi-file rename across 30 files in the QC repo — bounded scope, the diff is the artifact, operator approves before commit.
- Foreground: The MHG Hickory→Denver site comparison memo — operator iterates on framing as the agent drafts. Background would 4x the elapsed time.
- Never: “Submit the QC provisional patent draft to the USPTO portal in the morning if no review comes back overnight” — irreversible, no human gate.
- Never: “Background a code change and auto-push if tests pass” — auto-push without human-on-diff is a category-error use of backgrounding.
Three failure modes nobody warns about
The forgotten run
The state-mutating background
The runaway-cost background
The pattern across all three: backgrounding shifts the human checkpoint from during-the-work to before-and-after. If you don’t install the after checkpoint (notification + status + postmortem), backgrounding doesn’t actually save you time.
The pre-launch checklist
Three questions. If you can’t answer all three, don’t background yet.
Before backgrounding this work, run the three-question screen:
1. **Notification path** — How will success or failure surface?
- Slack webhook? Email? Local desktop notification?
- Without this, "background" means "delay the bad news"
2. **Watchdog** — What kills this if it stalls or loops?
- Wall-clock max time per phase
- Memory or cost ceiling
- Idle detection (no progress for N minutes)
3. **Exit hook** — What happens when the run finishes?
- Write artifact to <path>
- Notify on path
- Explicitly: NO auto-push, NO auto-deploy, NO auto-send
If any answer is "I don't know," DON'T background yet. Resolve
first, then launch.Notification + watchdog + exit hook. Skip any one and the lever turns into a foot-gun.
Most operators answer 2 of 3 reflexively and skip the watchdog. The watchdog is the one that prevents the QC-style $90 GPU waste. Don’t skip it on metered infrastructure.
Annotated transcript — Parley training run
A real background launch: the Parley Notebook 02 7-architecture run. Watch the pre-launch screen, the operator-initiated /status check, and the watchdog kills surfacing as legitimate results.
6-hour run, 2 watchdog kills (raised Q1.5 in the research notebook), no operator surprises.
- operator
Need to kick off a 7-architecture × 3-seed Parley training run. ~6 hours total on the GPU pod. Background it.
- agent
Before I background, three questions:
1. Notification on completion? I can wire a Slack webhook to ping when the run exits (success or failure). Recommended.
2. Watchdog? GPU pod is metered — $X/hour. If a single architecture stalls, the whole run can drag. I propose: per-architecture wall-clock cap of 60 minutes; if exceeded, kill that architecture, log, continue to next.
3. Exit hook? What happens when the run finishes? Just write results.json and stop, or also push the notebook update?No auto-push. Results land in 12-Parley/notebooks-published/notebook-02-architecture-ceiling/results.json and notify; you review and ship.
- operator
Yes Slack. Watchdog 60-min per arch. Exit: write results.json + Slack ping with summary. NO auto-push.
- agent
Backgrounding. Run ID: parley-nb02-2026-05-01-1147.
Launched. Watchdog active. Slack will ping on:
- Each architecture completion (7 expected)
- Final run completion (success or failure)
- Any architecture timeout-killed by watchdogEstimated wall-clock: 5h 40m. I'll surface a status check if you ask /status. Otherwise no further chatter from me — your foreground session is free to do other work.
- operator
[~4 hours later] /status
- agent
Run parley-nb02-2026-05-01-1147 status:
- 5 of 7 architectures complete
- 2 watchdog-killed (BiGRU at 67min, GcnLite at 72min — collapsed under shared recipe, raised Q1.5)
- 3 of 5 completed converged successfully (frame_transformer, SPOTER, ConformerSmall)
- Currently running: SqueezeformerSmall (seed 2 of 3)
- ETA: ~1h 20m remaining
- GPU spend so far: $X.XXNo action needed. I'll Slack on completion.
Prompt kit
Three prompts: pre-launch screen, status check, post-run postmortem.
Before backgrounding this work, run the three-question screen:
1. **Notification path** — How will success or failure surface?
- Slack webhook? Email? Local desktop notification?
- Without this, "background" means "delay the bad news"
2. **Watchdog** — What kills this if it stalls or loops?
- Wall-clock max time per phase
- Memory or cost ceiling
- Idle detection (no progress for N minutes)
3. **Exit hook** — What happens when the run finishes?
- Write artifact to <path>
- Notify on path
- Explicitly: NO auto-push, NO auto-deploy, NO auto-send
If any answer is "I don't know," DON'T background yet. Resolve
first, then launch./status
For run <id>, return:
- Phase progress (X of Y complete)
- Time elapsed and ETA
- Any watchdog kills, errors, or warnings
- Cost-so-far if metered infra
- Next event the agent is waiting on
No action; pure status.Run <id> finished. Walk me through:
1. What was the actual outcome vs the planned outcome?
2. Did any phase get watchdog-killed? Why?
3. What did this cost (compute + time)?
4. What surprised you?
5. Should the next similar run change watchdog limits, notification cadence, or exit hooks?
Output as a 5-paragraph postmortem. Save at <path>.Apply this — your first disciplined background launch
30-minute exercise. Plus the run time itself, which you’ll be doing other work during.
Run a background agent with discipline
Each step takes 5-10 minutes. Progress saves automatically.
- 01Identify a task on your plate that's >15 minutes of agent work and doesn't need your foreground attention.Common candidates: long research scrapes, multi-file refactors with clear scope, training/build runs.
- 02Run the pre-launch screen. Specifically: define the notification path, the watchdog, and the exit hook.If you can't answer all three, the task isn't ready to background. Resolve first.
- 03Launch. Note the run ID. Walk away to other work.If you find yourself checking on it every 5 minutes, you didn't trust the notification path. Fix that, not the checking habit.
- 04When the run completes (or fails), run the postmortem prompt. Save the lessons.First three background runs always teach you something about your watchdogs. Capture it.