Labs / Playbook / cv-for-physical-sports
Discipline● shippingCompanion · Field Notes № 0007

The CV-for-Physical-Sports Wedge Playbook

Don't build computer vision for sports. Build it for one sport — the one with the most constrained geometry and the simplest rules — get it referee-grade, then generalize. Three filters and the trust-bar framework.

The CV-for-Physical-Sports Wedge Playbook
  • Three filters that turn 'CV for sports' into a winnable first product
  • The trust-bar framework — why 96% accurate is a dispute generator
GitHub publication pendingRead the Field Note →

Most people who try to build computer vision for sports lose before they start. Not because the technology isn't ready — it is, and the companion field note for this playbook lays out why. They lose because they pick the wrong scope.

"Computer vision for sports" is not a wedge. It is a category. A team that wants to be in the category has to first survive a product, and a product means picking exactly one sport — the right one — and getting it so right that the people on the wrong end of a call don't argue.

This playbook is the way I'd think about that choice if I were starting from scratch.

Why fixed geometry is everything

A standing rule, before anything else: the difficulty of a CV-for-sports problem scales with how chaotic the playfield is.

The hardest version of this problem is an open field that pans and zooms — the action moves, the camera moves, the ground itself isn't a known plane. The easiest version is a rigid, fixed playfield with known geometry — a court of known dimensions, marked corners, an unmoving surface — because then you can lean on homography: a clean mathematical map from camera pixels to a known real-world plane.

Once you have that map, you've deleted an entire class of variable. Open-source pipelines do this routinely now — an Umpire AI research system maps directly to a standard basketball court; a published Roboflow soccer pipeline does it with 32 fixed pitch keypoints. Locking the geometry down lets you spend your real effort on the genuinely hard problems — small-object tracking, occlusion, the trust bar — instead of fighting the environment.

Every sport you might enter sits somewhere on a spectrum from "fixed geometry, known rules" to "open field, judgment calls." The closer to the first end, the better the wedge.

The three filters

A good first product passes all three. Skip one and the rest gets harder than it needs to.

Filter 1 — Rigid known geometry. The playfield is fixed in shape and dimensions. The relevant marks (lines, corners, target zones) are physically present and easy to find from one camera. You can calibrate once and trust the map for the duration of a session.

Filter 2 — Unambiguous rule set. Outcomes are binary or near-binary. There is no judgment call that a referee couldn't render from a single, fixed-angle replay. Sports where a call depends on intent, contact judgment, or "the spirit of the play" are bad first products — you will lose on the trust bar before you lose on the model.

Filter 3 — Real measurement gap. Elite versions of the sport are already measured by Hawk-Eye-class infrastructure. The amateur and recreational layer is not. That gap is the market — without it, even a perfect technical solution has nothing to sell against. Pick a sport where the gap is wide, not one already commoditized by a Pixellot or a Veo.

Three filters, three yes-or-no questions. A sport that fails any of the three is a worse wedge, full stop.

Why the trust bar matters more than accuracy

This is the part most engineering-led founders get wrong, and it's the part you'll regret most if you skip it.

A sports scoring system is not graded on average accuracy. It is graded on disputes. A system that is 96% right is, to the four percent of players on the wrong end, a dispute generator. When the AI taekwondo review system was studied at the Paris Olympics, it matched international-level referees on the vast majority of decisions and still diverged on nine cases out of 241 — and nearly every divergence was an occlusion or a blocked-view case. The headline accuracy was excellent. The product question is what happens to those nine.

Two things follow from this:

Build for the dispute case, not the average case. Your model evaluation has to include disputed-call categories explicitly. Performance on the easy 96% is table stakes; performance on the contested cases is your product. A confusion matrix that aggregates over both is hiding the only number that matters.

Decide your behavior at the trust edge before you ship. When the model is below its confidence threshold, what happens? Does it abstain? Does it surface the contested frame to a human? Does it score the call but flag it? Research-tier deployments are moving toward exactly this pattern — high-confidence calls go through automatically; low-confidence calls trigger human review. The point isn't to be perfect. It's to be wrong in a way the user accepts.

Why the order matters: nail one, then generalize

The temptation, once a single-sport system works, is to widen immediately. Resist it for one cycle longer than feels comfortable.

NurivaTech said it cleanly when they announced their biomechanics platform: "We started with baseball because its technical demands make it the ultimate test, and mastering it means we have the foundation for every sport." That is the right shape of the argument. You're not narrowing your ambition by picking one sport — you're earning the right to generalize from a *solved* foundation instead of a fuzzy general system that's mediocre everywhere.

Practically: get one sport to the point where a referee would defer to your call on a disputed point. That is the bar. Until you've cleared it, "we could support these other sports too" is marketing, not product.

Once you've cleared it, the components that did the work — the homography pipeline, the small-object tracker, the disputed-call handoff, the calibration recovery — are the things you generalize. Not the sport, the *stack*.

What's in the bundle

The three-filter checklist as a single page. The trust-bar framework — including the disputed-call evaluation matrix and the decision rule for what to do at the model's confidence edge. A short reference on homography-as-product-decision, with the public pipelines (Umpire AI, the Roboflow soccer build) as worked examples. The honest hard-parts list — the failure modes (lighting, occlusion, calibration drift, latency) that don't go away no matter how good the wedge — and the order to attack them in.

And the companion field note — issue 0007 — for the longer argument about why now and what the category looks like across sport after sport.

The wedge is the whole game. Pick it carefully.

— Michael, from the lab