Foundations · Lesson 09 — Picking your model — Opus, Sonnet, Haiku

F09Foundations

Foundations · Lesson 09● live

Picking your model

Opus, Sonnet, Haiku — three models, three jobs.

12 min read · 20 min applyno prereq

Why model choice matters more than you think

Anthropic ships three models. They differ on raw capability, latency, and cost-per-token by roughly an order of magnitude across the lineup. Most operators ignore that and run Opus on everything because Opus is the “best” one.

That choice is wrong in two directions. Opus on routine work overpays. A two-file rename, a CSS tweak, a kanban schema patch — Sonnet handles these indistinguishably. You’re paying ~5x for an answer the cheaper model would have produced byte-for-byte the same. Sonnet on architectural work underpays. When the task spans 8 files, requires reasoning about three subsystems, and a wrong answer means a re-write, the marginal cost of Opus is trivial relative to the cost of being wrong.

The operators I know who track this carefully save $200-400/month on routing alone, with no measurable degradation in output quality. The work that matters still gets Opus. The work that doesn’t, doesn’t.

This lesson is the routing rule. Three minutes to memorize, applies for the rest of your operator career.

The three-job ladder

Each model has a job. The job is defined by three properties: reasoning depth, file fan-out, and cost of being wrong.

Model	Default for	Signals it’s the right pick	Don’t use it for
Opus	Architectural decisions, multi-file refactors, expensive-to-undo work	4+ files, novel reasoning, cross-system, legal/IP, schema migrations, patent draft	Single-file edits, known patterns, batch classification
Sonnet	Daily code work — your default model	1-3 files, known patterns, reversible changes, document drafting, ordinary debugging	Many-file architecture, batch tagging at scale
Haiku	High-volume, low-stakes classification or summarization	Hundreds of inputs, deterministic-ish outputs, JSON extraction, label-this-row work	Anything with branching logic, debugging, code that ships

Default Sonnet. Most operators’ sessions are 1-3 files of known patterns. Sonnet does that perfectly and costs ~5x less than Opus. Reach for Opus when at least two of these are true: 4+ files, novel reasoning, expensive rollback. Reach for Haiku when you’re doing batch work and you’d be fine with 95% accuracy.

A pattern from running QC, MHG, and Parley side by side: match the model to the agent, not to the prestige of the work. CIPHER (legal/IP) work is high-stakes — Opus. SUMMIT (MHG operations) is mostly known-pattern paperwork — Sonnet. APEX inbox triage is bulk classification — Haiku. The agent’s scope tells you the model.

Sometimes the answer is “no model.” If your task is “rename foo to bar across 50 files,” the right tool isn’t Haiku. It’s sed. Don’t reach for an LLM when a deterministic tool exists.

Three ways operators get model choice wrong

The patterns that produce silent overspend or quiet quality regressions. Hover any card to see the diagnosis.

№ 01

Opus for everything

claim looks like“"Opus is the smartest, just always use Opus."”

what’s missingYou're paying ~5x for tasks where Sonnet performs identically. Across a busy month that's $200-400 of cache-warm spend on routine renames, lints, and grep-style reads.

the moveDefault Sonnet. Reach for Opus when the task involves reasoning across many files, architectural decisions, or code where a wrong answer is expensive to undo.

№ 02

Haiku for production logic

claim looks like“"It's just a script — Haiku will do."”

what’s missingHaiku skips steps. On any task with branching logic, schema migrations, or hardware-in-loop debugging it will plausibly hallucinate. The savings vanish on the second debug round.

the moveHaiku is for high-volume, low-stakes work: classification, summarization at scale, JSON extraction. Not for code that ships.

№ 03

Switching models mid-thread

claim looks like“"Let me just swap to Sonnet to save tokens here."”

what’s missingCache invalidates on model swap. The session reloads context from scratch. You pay re-priming cost (often 10-30k tokens) for a switch that saves you 2k.

the movePick your model at session start based on the dominant work type. Stay in lane. If you genuinely need a different model, start a fresh session.

The fix in all three: route at session start, stay in lane, match model to stakes. Don’t reach up out of habit and don’t reach down to feel thrifty.

The 30-second routing prompt

Before you start a session, run this. It forces an explicit answer to the four questions that matter.

Pre-session model routing — paste before you start work

Before I start this work, route the model:

1. How many files will this touch?
   - 1-3 files → lean Sonnet
   - 4+ files OR cross-cutting → lean Opus
2. Is the wrong answer cheap or expensive to undo?
   - Cheap (commit + revert) → Sonnet OK
   - Expensive (filed patent, shipped firmware, db migration) → Opus
3. Is this a known pattern or novel reasoning?
   - Known pattern → Sonnet
   - Novel architectural call → Opus
4. Is this batch / high-volume / low-stakes classification?
   - Yes → Haiku
   - No → Sonnet or Opus per above

Output the recommended model + one-line reason.
If the answer is "no model, just a script," say so.

Forces explicit reasoning. After ~10 reps you stop needing the prompt — the answer becomes obvious.

The operators I’ve watched build this habit do it for two weeks, then stop. The internalized rule fires fast enough that the prompt becomes overhead. But the two weeks of explicit reps is what installs the reflex.

Annotated transcript

A real session deciding model choice across three different tasks: a Supabase RLS update, a bulk rename, and a patent draft. Click any pinned annotation to expand the commentary.

Routing across three task types in one operator day

Same operator, three different model choices. The pattern is the same; the answers are different.

operator
I want to add Supabase row-level security policies to the trupath_tasks table and update the kanban HTML to handle the new auth flow. Which model?
agent
Sonnet. Two-file, well-scoped, schema is known. RLS policy syntax is in your training data and the kanban HTML pattern is repetitive. I'll plan, write the SQL migration, update the fetch headers, and regression-test against your existing publishable key.
If the task were "redesign the auth model across QC + MHG + Parley to share a single identity layer" — that's Opus. Many files, architectural reasoning, expensive-if-wrong.
operator
What about the boring stuff — like "rename every reference to the old kanban URL"?
agent
Haiku territory if you want to bulk-rewrite. But honestly for a rename across <50 files I'd just use a Bash sed and skip the agent. Haiku is best for high-volume classification: "label these 4,000 support tickets by category" or "summarize each of these 200 transcripts in one line." That's where its speed/cost wins.
For a search-and-replace, the right tool isn't a smaller model — it's not a model at all.
operator
OK. Sonnet for this session. What about the patent draft work later — same?
agent
Different session, Opus. Patent drafting touches claim language, prior art reasoning, and structural choices that are expensive to redo after filing. The QC provisional package is 482 lines plus a 36-commit evidence ledger — you want the model that catches the structural issue on read 1, not read 3.
Rule of thumb: anything Cipher routes to me, default Opus. Anything Summit routes for routine MHG SBA paperwork, default Sonnet. Anything Apex routes for bulk inbox triage, Haiku.

Prompt kit

Three prompts: pre-session routing, post-week audit, personal-rule builder.

Pre-session model routing

Before I start this work, route the model:

1. How many files will this touch?
   - 1-3 files → lean Sonnet
   - 4+ files OR cross-cutting → lean Opus
2. Is the wrong answer cheap or expensive to undo?
   - Cheap (commit + revert) → Sonnet OK
   - Expensive (filed patent, shipped firmware, db migration) → Opus
3. Is this a known pattern or novel reasoning?
   - Known pattern → Sonnet
   - Novel architectural call → Opus
4. Is this batch / high-volume / low-stakes classification?
   - Yes → Haiku
   - No → Sonnet or Opus per above

Output the recommended model + one-line reason.
If the answer is "no model, just a script," say so.

Audit last week's model spend

Pull my Anthropic usage for the last 7 days. For each session over
$2 in spend:
- What model ran it
- What the dominant task was (code edit, planning, batch classify, etc.)
- Whether a smaller model would have produced equivalent output

Flag sessions where Opus was used for what should have been Sonnet,
or Sonnet for what should have been Haiku. Estimate monthly savings
if those sessions had been routed correctly.

Build a personal routing rule

Walk my last 20 Claude Code sessions and group them by task type.
For each task type, recommend a default model and a one-line trigger
phrase that should auto-route there.

Output as a 5-7 row table I can paste into CLAUDE.md under
"## Model routing".

Apply this — install the routing reflex

20-minute exercise plus a 7-day re-check. The reflex installs in two weeks. The savings show up in week one.

Install the model-routing habit

Each step takes 5 minutes. Progress saves automatically.

0/4

01Open your Anthropic Console billing page. Note the last 30 days of spend.If you don't know what you spent, you can't tell whether you're routing well.
02Pick your top 3 task types this week (e.g., code edits, doc drafting, inbox triage).Assign each a default model. Write the assignment in CLAUDE.md.
03Run the routing prompt before your next 3 sessions.Trains the reflex. After 3 reps it becomes automatic.
04Re-check spend after 7 days. Look for a 20-40% reduction.If you don't see a drop, your default was already correct — fine. If you see one, the habit just paid for the year.

Foundations tier · what's next

After this lesson

Foundations · № 10● live

Your first plan-mode session

When to drop into Plan, how to read its output, when to exit.

12 min read · 30 min apply

Foundations · № 11● live

Reading agent output critically

The trust-but-verify reflex. Spot fabricated results before they ship.

12 min read · ongoing apply

Foundations · № 12● live

Slash commands you'll actually use

The 6 slash commands worth memorizing on day one.

10 min read · 15 min apply