
Picking your model
Opus, Sonnet, Haiku — three models, three jobs.
Why model choice matters more than you think
Anthropic ships three models. They differ on raw capability, latency, and cost-per-token by roughly an order of magnitude across the lineup. Most operators ignore that and run Opus on everything because Opus is the “best” one.
That choice is wrong in two directions. Opus on routine work overpays. A two-file rename, a CSS tweak, a kanban schema patch — Sonnet handles these indistinguishably. You’re paying ~5x for an answer the cheaper model would have produced byte-for-byte the same. Sonnet on architectural work underpays. When the task spans 8 files, requires reasoning about three subsystems, and a wrong answer means a re-write, the marginal cost of Opus is trivial relative to the cost of being wrong.
The operators I know who track this carefully save $200-400/month on routing alone, with no measurable degradation in output quality. The work that matters still gets Opus. The work that doesn’t, doesn’t.
This lesson is the routing rule. Three minutes to memorize, applies for the rest of your operator career.
The three-job ladder
Each model has a job. The job is defined by three properties: reasoning depth, file fan-out, and cost of being wrong.
| Model | Default for | Signals it’s the right pick | Don’t use it for |
|---|---|---|---|
| Opus | Architectural decisions, multi-file refactors, expensive-to-undo work | 4+ files, novel reasoning, cross-system, legal/IP, schema migrations, patent draft | Single-file edits, known patterns, batch classification |
| Sonnet | Daily code work — your default model | 1-3 files, known patterns, reversible changes, document drafting, ordinary debugging | Many-file architecture, batch tagging at scale |
| Haiku | High-volume, low-stakes classification or summarization | Hundreds of inputs, deterministic-ish outputs, JSON extraction, label-this-row work | Anything with branching logic, debugging, code that ships |
Default Sonnet. Most operators’ sessions are 1-3 files of known patterns. Sonnet does that perfectly and costs ~5x less than Opus. Reach for Opus when at least two of these are true: 4+ files, novel reasoning, expensive rollback. Reach for Haiku when you’re doing batch work and you’d be fine with 95% accuracy.
A pattern from running QC, MHG, and Parley side by side: match the model to the agent, not to the prestige of the work. CIPHER (legal/IP) work is high-stakes — Opus. SUMMIT (MHG operations) is mostly known-pattern paperwork — Sonnet. APEX inbox triage is bulk classification — Haiku. The agent’s scope tells you the model.
Sometimes the answer is “no model.” If your task is “rename foo to bar across 50 files,” the right tool isn’t Haiku. It’s sed. Don’t reach for an LLM when a deterministic tool exists.
Three ways operators get model choice wrong
The patterns that produce silent overspend or quiet quality regressions. Hover any card to see the diagnosis.
Opus for everything
Haiku for production logic
Switching models mid-thread
The fix in all three: route at session start, stay in lane, match model to stakes. Don’t reach up out of habit and don’t reach down to feel thrifty.
The 30-second routing prompt
Before you start a session, run this. It forces an explicit answer to the four questions that matter.
Before I start this work, route the model:
1. How many files will this touch?
- 1-3 files → lean Sonnet
- 4+ files OR cross-cutting → lean Opus
2. Is the wrong answer cheap or expensive to undo?
- Cheap (commit + revert) → Sonnet OK
- Expensive (filed patent, shipped firmware, db migration) → Opus
3. Is this a known pattern or novel reasoning?
- Known pattern → Sonnet
- Novel architectural call → Opus
4. Is this batch / high-volume / low-stakes classification?
- Yes → Haiku
- No → Sonnet or Opus per above
Output the recommended model + one-line reason.
If the answer is "no model, just a script," say so.Forces explicit reasoning. After ~10 reps you stop needing the prompt — the answer becomes obvious.
The operators I’ve watched build this habit do it for two weeks, then stop. The internalized rule fires fast enough that the prompt becomes overhead. But the two weeks of explicit reps is what installs the reflex.
Annotated transcript
A real session deciding model choice across three different tasks: a Supabase RLS update, a bulk rename, and a patent draft. Click any pinned annotation to expand the commentary.
Same operator, three different model choices. The pattern is the same; the answers are different.
- operator
I want to add Supabase row-level security policies to the trupath_tasks table and update the kanban HTML to handle the new auth flow. Which model?
- agent
Sonnet. Two-file, well-scoped, schema is known. RLS policy syntax is in your training data and the kanban HTML pattern is repetitive. I'll plan, write the SQL migration, update the fetch headers, and regression-test against your existing publishable key.
If the task were "redesign the auth model across QC + MHG + Parley to share a single identity layer" — that's Opus. Many files, architectural reasoning, expensive-if-wrong.
- operator
What about the boring stuff — like "rename every reference to the old kanban URL"?
- agent
Haiku territory if you want to bulk-rewrite. But honestly for a rename across <50 files I'd just use a Bash sed and skip the agent. Haiku is best for high-volume classification: "label these 4,000 support tickets by category" or "summarize each of these 200 transcripts in one line." That's where its speed/cost wins.
For a search-and-replace, the right tool isn't a smaller model — it's not a model at all.
- operator
OK. Sonnet for this session. What about the patent draft work later — same?
- agent
Different session, Opus. Patent drafting touches claim language, prior art reasoning, and structural choices that are expensive to redo after filing. The QC provisional package is 482 lines plus a 36-commit evidence ledger — you want the model that catches the structural issue on read 1, not read 3.
Rule of thumb: anything Cipher routes to me, default Opus. Anything Summit routes for routine MHG SBA paperwork, default Sonnet. Anything Apex routes for bulk inbox triage, Haiku.
Prompt kit
Three prompts: pre-session routing, post-week audit, personal-rule builder.
Before I start this work, route the model:
1. How many files will this touch?
- 1-3 files → lean Sonnet
- 4+ files OR cross-cutting → lean Opus
2. Is the wrong answer cheap or expensive to undo?
- Cheap (commit + revert) → Sonnet OK
- Expensive (filed patent, shipped firmware, db migration) → Opus
3. Is this a known pattern or novel reasoning?
- Known pattern → Sonnet
- Novel architectural call → Opus
4. Is this batch / high-volume / low-stakes classification?
- Yes → Haiku
- No → Sonnet or Opus per above
Output the recommended model + one-line reason.
If the answer is "no model, just a script," say so.Pull my Anthropic usage for the last 7 days. For each session over
$2 in spend:
- What model ran it
- What the dominant task was (code edit, planning, batch classify, etc.)
- Whether a smaller model would have produced equivalent output
Flag sessions where Opus was used for what should have been Sonnet,
or Sonnet for what should have been Haiku. Estimate monthly savings
if those sessions had been routed correctly.Walk my last 20 Claude Code sessions and group them by task type.
For each task type, recommend a default model and a one-line trigger
phrase that should auto-route there.
Output as a 5-7 row table I can paste into CLAUDE.md under
"## Model routing".Apply this — install the routing reflex
20-minute exercise plus a 7-day re-check. The reflex installs in two weeks. The savings show up in week one.
Install the model-routing habit
Each step takes 5 minutes. Progress saves automatically.
- 01Open your Anthropic Console billing page. Note the last 30 days of spend.If you don't know what you spent, you can't tell whether you're routing well.
- 02Pick your top 3 task types this week (e.g., code edits, doc drafting, inbox triage).Assign each a default model. Write the assignment in CLAUDE.md.
- 03Run the routing prompt before your next 3 sessions.Trains the reflex. After 3 reps it becomes automatic.
- 04Re-check spend after 7 days. Look for a 20-40% reduction.If you don't see a drop, your default was already correct — fine. If you see one, the habit just paid for the year.