Foundations · Lesson 13 — Background agents — when, why, never

F13Foundations

Foundations · Lesson 13● live

Background agents — when, why, never

Long jobs offloaded right; the failure modes nobody warns about.

12 min read · 30 min applyno prereq

What background agents are

A background agent is a Claude Code task you launch and walk away from. Instead of watching the agent work turn-by-turn, you authorize a plan, the agent runs to completion (or failure), and your foreground session is free to do other work in parallel.

The lever is real. A 6-hour Parley training run, a multi-hour repo scrape, a long QC CV evaluation pass — none of these need a human in the loop minute-by-minute. Backgrounding turns sequential operator-bound time into parallel agent-runs, which is how a single operator gets the throughput of a small team.

The failure modes are also real, and undersold. Background does not mean unattended. A job without a notification path, a watchdog, and an exit hook is a job that delays bad news, not one that frees your time. The QC project burned $90.66 in wasted GPU spend before the discipline got installed.

This lesson is about installing the three-piece discipline (notify / watchdog / exit hook) and learning when not to background — because there are categories of work where backgrounding is straight-up dangerous.

When to background, when never to

The decision rule:

Background	Foreground	Never background
Long deterministic work — training runs, batch scrapes, multi-file refactors with clear scope	Anything where you’ll iterate within 30 seconds of seeing output	Tasks with side effects you can’t cheaply undo (push, deploy, send-email, money-move)
Runs >15 min where your input isn’t the bottleneck	Plan-mode sessions, design exploration, research conversations	Tasks where the agent will hit a fork it can’t resolve without you
Parallel work alongside foreground — “build this while I work on that”	Anything you’d describe as “exploring”	Tasks on metered infra without a watchdog

Examples from the portfolio:

Background OK: Parley Notebook 02’s 7-architecture training run — bounded scope, deterministic, results land in a file, no side effects.
Background OK: A multi-file rename across 30 files in the QC repo — bounded scope, the diff is the artifact, operator approves before commit.
Foreground: The MHG Hickory→Denver site comparison memo — operator iterates on framing as the agent drafts. Background would 4x the elapsed time.
Never: “Submit the QC provisional patent draft to the USPTO portal in the morning if no review comes back overnight” — irreversible, no human gate.
Never: “Background a code change and auto-push if tests pass” — auto-push without human-on-diff is a category-error use of backgrounding.

Three failure modes nobody warns about

№ 01

The forgotten run

claim looks like“Operator backgrounds a 3-hour job, walks away, never checks. Comes back 24 hours later to find it failed at minute 4.”

what’s missingBackground ≠ unattended. The job needs a notification path (Slack, email, exit-status hook) so a failure surfaces fast. Without one, you've not bought time — you've delayed the bad news.

the moveEvery background launch wires a notification on completion AND on failure. Use the Monitor tool, hook a webhook, or set a 30-min check-in alarm.

№ 02

The state-mutating background

claim looks like“"Background this — go ahead and commit and push when tests pass."”

what’s missingYou just authorized an unattended push to main without a human on the diff. If the agent's plan was wrong, you'll see the bad commit when CI fires. Or worse, after deploy.

the moveBackground agents should never auto-push, auto-deploy, auto-merge, or auto-send. They produce artifacts (a draft, a build output, a results file). The operator approves the side effect.

№ 03

The runaway-cost background

claim looks like“"Run this CV training overnight" — operator forgets it's on a billed GPU pod.”

what’s missingBackground work on metered infrastructure produces silent cost. The QC project ate $90.66 in wasted GPU spend before the discipline got installed. Without a watchdog, you find the bill when you find the bill.

the moveBackground runs on metered infra need: (a) a wall-clock max-time, (b) an exit-status notification, (c) a watchdog that kills the run on stuck/looping behavior. Three hooks, installed once, save the project from the recurring waste.

The pattern across all three: backgrounding shifts the human checkpoint from during-the-work to before-and-after. If you don’t install the after checkpoint (notification + status + postmortem), backgrounding doesn’t actually save you time.

The pre-launch checklist

Three questions. If you can’t answer all three, don’t background yet.

Pre-launch screen — paste before any background launch

Before backgrounding this work, run the three-question screen:

1. **Notification path** — How will success or failure surface?
   - Slack webhook? Email? Local desktop notification?
   - Without this, "background" means "delay the bad news"

2. **Watchdog** — What kills this if it stalls or loops?
   - Wall-clock max time per phase
   - Memory or cost ceiling
   - Idle detection (no progress for N minutes)

3. **Exit hook** — What happens when the run finishes?
   - Write artifact to <path>
   - Notify on path
   - Explicitly: NO auto-push, NO auto-deploy, NO auto-send

If any answer is "I don't know," DON'T background yet. Resolve
first, then launch.

Notification + watchdog + exit hook. Skip any one and the lever turns into a foot-gun.

Most operators answer 2 of 3 reflexively and skip the watchdog. The watchdog is the one that prevents the QC-style $90 GPU waste. Don’t skip it on metered infrastructure.

Annotated transcript — Parley training run

A real background launch: the Parley Notebook 02 7-architecture run. Watch the pre-launch screen, the operator-initiated /status check, and the watchdog kills surfacing as legitimate results.

Backgrounding the Parley Notebook 02 training run

6-hour run, 2 watchdog kills (raised Q1.5 in the research notebook), no operator surprises.

operator
Need to kick off a 7-architecture × 3-seed Parley training run. ~6 hours total on the GPU pod. Background it.
agent
Before I background, three questions:
1. Notification on completion? I can wire a Slack webhook to ping when the run exits (success or failure). Recommended.
2. Watchdog? GPU pod is metered — $X/hour. If a single architecture stalls, the whole run can drag. I propose: per-architecture wall-clock cap of 60 minutes; if exceeded, kill that architecture, log, continue to next.
3. Exit hook? What happens when the run finishes? Just write results.json and stop, or also push the notebook update?
No auto-push. Results land in 12-Parley/notebooks-published/notebook-02-architecture-ceiling/results.json and notify; you review and ship.
operator
Yes Slack. Watchdog 60-min per arch. Exit: write results.json + Slack ping with summary. NO auto-push.
agent
Backgrounding. Run ID: parley-nb02-2026-05-01-1147.
Launched. Watchdog active. Slack will ping on:
- Each architecture completion (7 expected)
- Final run completion (success or failure)
- Any architecture timeout-killed by watchdog
Estimated wall-clock: 5h 40m. I'll surface a status check if you ask /status. Otherwise no further chatter from me — your foreground session is free to do other work.
operator
[~4 hours later] /status
agent
Run parley-nb02-2026-05-01-1147 status:
- 5 of 7 architectures complete
- 2 watchdog-killed (BiGRU at 67min, GcnLite at 72min — collapsed under shared recipe, raised Q1.5)
- 3 of 5 completed converged successfully (frame_transformer, SPOTER, ConformerSmall)
- Currently running: SqueezeformerSmall (seed 2 of 3)
- ETA: ~1h 20m remaining
- GPU spend so far: $X.XX
No action needed. I'll Slack on completion.

Prompt kit

Three prompts: pre-launch screen, status check, post-run postmortem.

The pre-launch screen

Before backgrounding this work, run the three-question screen:

1. **Notification path** — How will success or failure surface?
   - Slack webhook? Email? Local desktop notification?
   - Without this, "background" means "delay the bad news"

2. **Watchdog** — What kills this if it stalls or loops?
   - Wall-clock max time per phase
   - Memory or cost ceiling
   - Idle detection (no progress for N minutes)

3. **Exit hook** — What happens when the run finishes?
   - Write artifact to <path>
   - Notify on path
   - Explicitly: NO auto-push, NO auto-deploy, NO auto-send

If any answer is "I don't know," DON'T background yet. Resolve
first, then launch.

Status check on a running background job

/status

For run <id>, return:
- Phase progress (X of Y complete)
- Time elapsed and ETA
- Any watchdog kills, errors, or warnings
- Cost-so-far if metered infra
- Next event the agent is waiting on

No action; pure status.

Postmortem on a background run

Run <id> finished. Walk me through:
1. What was the actual outcome vs the planned outcome?
2. Did any phase get watchdog-killed? Why?
3. What did this cost (compute + time)?
4. What surprised you?
5. Should the next similar run change watchdog limits, notification cadence, or exit hooks?

Output as a 5-paragraph postmortem. Save at <path>.

Apply this — your first disciplined background launch

30-minute exercise. Plus the run time itself, which you’ll be doing other work during.

Run a background agent with discipline

Each step takes 5-10 minutes. Progress saves automatically.

0/4

01Identify a task on your plate that's >15 minutes of agent work and doesn't need your foreground attention.Common candidates: long research scrapes, multi-file refactors with clear scope, training/build runs.
02Run the pre-launch screen. Specifically: define the notification path, the watchdog, and the exit hook.If you can't answer all three, the task isn't ready to background. Resolve first.
03Launch. Note the run ID. Walk away to other work.If you find yourself checking on it every 5 minutes, you didn't trust the notification path. Fix that, not the checking habit.
04When the run completes (or fails), run the postmortem prompt. Save the lessons.First three background runs always teach you something about your watchdogs. Capture it.

Foundations tier · what's next

After this lesson

Foundations · № 14● live

Reading a diff before you accept it

The 4-pass diff review that catches what tests miss.

12 min read · ongoing apply

Foundations · № 15● live

Your first MCP server

The 20-minute setup, the gotchas, when MCP beats writing a tool.

15 min read · 20 min apply

Foundations · № 16● live

Your first custom agent file

Build CIPHER (legal/IP) from scratch in one sitting.

15 min read · 45 min apply