TPL-2026-013·preprint·2026-05-01

Cost-per-Artifact Curves Across Claude Model Tiers (Opus 4.7 / Sonnet 4.6 / Haiku 4.5)

cost-optimizationmodel-selectionLLM-agentsmethodologycross-venture

Abstract

Selecting the right Claude model tier for a given artifact type is the highest-leverage cost decision an operator makes in an agent-mediated workflow. We analyze cost-per-completed-artifact data across 312 artifact production runs spanning six task types (greenfield drafting, debugging, code review, research synthesis, structured extraction, and configuration authoring) and three Claude tiers (Opus 4.7, Sonnet 4.6, Haiku 4.5) in the TruPath portfolio. Cost-per-artifact curves diverge sharply by task type: Sonnet 4.6 dominates greenfield drafting and structured extraction; Opus 4.7 dominates complex debugging and novel-architecture decisions; Haiku 4.5 followed by Opus escalation outperforms both single-tier strategies for code review. A routing matrix derived from these findings is estimated to reduce portfolio-wide LLM spend by 31–44% at constant output quality. All figures are illustrative; see §5.

1. Introduction

The proliferation of LLM model tiers within a single provider’s lineup has created a new class of cost-optimization decision for operators of agent-mediated workflows. Anthropic’s Claude family now spans three tiers — Haiku (fastest, cheapest), Sonnet (balanced), and Opus (most capable, most expensive) — with published capability and pricing that differ by approximately an order of magnitude between Haiku and Opus [3]. The naive approach is to default to a single tier for all tasks. Most operators who think carefully default to Sonnet as a middle ground. Neither strategy is cost-optimal.

The theoretical case for tier-routing is straightforward: task types differ in the capability requirements they impose on the model. A task that requires only template-completion (config authoring, structured field extraction) imposes minimal reasoning demands; the cost-per-artifact on Haiku will be low, and quality will be adequate. A task that requires multi-step causal reasoning under uncertainty (debugging a firmware-CV protocol mismatch; architecting a novel sensor fusion approach) places high demands on the model; Haiku will fail at high rates, and the true cost of using Haiku — including the cost of failed attempts, escalation, and rework — will exceed the cost of using Opus directly. The optimal routing strategy uses the cheapest tier that meets the quality bar for each task type [6].

Empirical work on LLM cascade routing [7] and model selection [5] establishes that routing is effective in principle, but most published work uses open-ended benchmarks rather than the artifact-production workflows typical of a small-company operator. This paper presents cost-per-artifact curves from the TruPath portfolio — three ventures (Quantum Caddy, Mile High Golf, Parley / TruPath cross) spanning engineering, operations, and research work types — to derive a practical routing matrix. This extends the token-economics analysis in [8] with the addition of quality measurement and the Haiku tier.

Subscribers only · continued

The rest of TPL-2026-013 is for subscribers.

Cost-per-Artifact Curves Across Claude Model Tiers (Opus 4.7 / Sonnet 4.6 / Haiku 4.5)

  • Every Expert-tier lesson — diagnostic prompts, transcripts, prompt kits, full homework
  • Every research paper — methodology, figures, tables, reproducibility appendices
  • New Expert lessons + papers as they ship (quarterly cadence)
  • Foundations + Operating lessons stay free; bundles on GitHub stay free; this tier is the deep stuff

Free while the early catalog ships. Paid tier comes later — subscribe now and you’re grandfathered in.

Cite as: TruPath Labs Research (2026). Cost-per-Artifact Curves Across Claude Model Tiers (Opus 4.7 / Sonnet 4.6 / Haiku 4.5). TruPath Labs Preprint TPL-2026-013. trupathventures.net/labs/research/model-tier-cost-curves