Operating · Lesson 22 — Picking a computer-vision wedge: start with a fixed playfield
O22Operating
Operating · Lesson 22● live

Picking a CV wedge — start with a fixed playfield

The hardest version of a CV problem is an open field. The easiest is geometry you can pin.

11 min read · 20 min applycompanion: CV for physical sports

The thesis

Every computer-vision project I’ve watched stall had the same shape: the team picked an ambitious, open, free-flowing environment as the v1 target and got buried under variables that the constrained version of the same problem would have given them for free.

The fix is a rule I’ve internalized the hard way: find the most spatially constrained version of the problem and start there.Not because the constrained version is the product — it usually isn’t — but because it’s the only version where you can demonstrate the hard parts of the system actually work before you generalize.

A fixed playfield with rigid known geometry — a court, a rink, a diamond, a pitch with painted keypoints — collapses an entire class of CV difficulty into a one-time calibration step. An open field doesn’t give you that. Choosing your wedge is choosing whether you spend the next year on the genuinely hard problems or on infrastructure the constrained version would have given you for free.

Why fixed geometry is dramatically easier

Computer vision is fundamentally an inference problem: from a stream of pixels, infer what’s happening in three-dimensional space. Every constraint you can pin reduces the inference burden. Geometry is the highest-leverage constraint you can pin because it’s the substrate every other inference depends on.

When the playfield is fixed and known — a basketball court is 28 meters by 15 meters, a soccer pitch has a standard set of marked keypoints, a baseball diamond has 90-foot bases — you can do one piece of math at setup time and use it forever after. That math is called homography (next section), and what it gives you is a clean function: pixel-on-the-court goes in, real-world-coordinate-on-the-court comes out.

The downstream consequences are large. Speed estimation becomes distance-per-frame in meters instead of pixels-per-frame, divided by some perspective-dependent scaling factor that you have to re-derive. Possession logic becomes who is closest to the ball, in meters. Heatmaps become real spatial distributions instead of pixel-density artifacts. Re-identification across cameras becomes tractable because every camera maps to the same coordinate system.

None of those are theoretical wins. They’re weeks-to-months of engineering effort that you don’t spend, because the constraint of a known playfield gave you the answer at setup time.

The public examples all use the same move. Umpire AI for basketball maps detected court keypoints to a standard 28-by-15-meter court and runs all downstream logic in meters. A widely-cited Roboflow soccer pipeline calibrates against 32 fixed pitch keypoints and then operates entirely in pitch coordinates. Both projects are explicitly leveraging the fact that the playfield is rigid, known, and measurable in advance — that’s the wedge.

Homography, in plain language

A homography is the mathematical relationship between two flat planes viewed from different angles. In computer vision, the two planes are usually (a) the playfield as it appears in your camera image, and (b) the playfield as it actually exists in the world, viewed from directly above. A homography is the 3-by-3 matrix that maps one to the other.

You compute it by picking at least four points whose real-world coordinates you know — the four corners of a court, for example, or specific intersections of painted lines — and noting where each point appears in your camera image. That gives you four pixel-to-coordinate pairs. Standard linear algebra solves for the matrix. The math has been textbook for decades; the libraries do it in one function call. The conceptual point is what it lets you do afterward.

Once you have the matrix, you have a function. Hand it any pixel that lies on the playfield, get back the real-world (x, y) coordinate of that point. The detector that finds a player’s feet gets fed through the homography, and now you have a player’s court position in meters. The detector that finds the ball gets fed through the homography (with one caveat — you have to handle the ball’s height above the surface), and now you have ball trajectory in real-world space.

The reason this matters as a wedge decision: a project that starts on a fixed playfield gets to compute the homography once and reuse it. A project that starts in an open environment — a park, a beach, an unmarked yard — has no such calibration. Every frame has to infer its own spatial structure from whatever happens to be visible. That’s a much harder problem, and it’s usually the one that consumes the project’s engineering budget before the genuinely hard problems even get touched.

Three ways wedge-picking fails

The most common failure modes I’ve seen when teams choose the v1 target. Hover any card to see the diagnosis.

01

The open-field first attempt

claim looks likeOperator picks an open, free-flowing environment as the first target — outdoor pickup play, an unconstrained yard game, a park with no boundary lines.
what’s missingEvery hard CV problem fires at once: variable lighting, no spatial reference, no rigid geometry to anchor pixel-to-world math. The team spends six months on infrastructure that an indoor, court-based version would have given for free.
the moveFind the most-constrained version of the same problem and start there. Indoor before outdoor. Court before field. Painted lines before grass. Win the constrained version cleanly, then generalize.
02

The over-generalized v1

claim looks likeOperator builds the pipeline to handle three sports from day one, because "the architecture should be sport-agnostic."
what’s missingEvery sport-specific assumption gets abstracted out, which means none of them get exploited. The system that could have used a known 28-by-15-meter basketball court as a calibration anchor instead carries a generic geometry layer that doesn't get used well anywhere.
the moveOne sport, one playfield, one pipeline. Exploit every constraint that sport gives you. Generalization is a refactor problem after v1 works — not an architectural decision before it does.
03

The trust-bar miscalibration

claim looks likeOperator ships when the model hits 90% accuracy on internal data and is surprised users reject it.
what’s missingFor a scorekeeper or officiating tool, 90% accuracy means one wrong call in ten — which is also one dispute in ten. The product fails on the trust dimension before it fails on the accuracy number.
the moveSet the trust bar before you start. For officiating-adjacent products, that bar is usually 99%+ on the decisions users see, not 90% on internal validation. Choose a wedge where you can plausibly hit the trust bar — narrowness helps here too.

The fix in all three: start with the most-constrained version of the same problem.Constraints are gifts. They delete variables you don’t have to solve in v1.

The most-constrained-version test

The diagnostic for evaluating any proposed CV target: ask whether you’ve actually found the narrowest version of the problem you could win first.

The most-constrained-version test
I want to build a computer-vision product in this domain: <describe>.

Walk through:
1. What's the most spatially constrained version of this problem?
   - Is there a fixed playfield with known geometry?
   - Are the boundaries rigid (painted lines, walls, fences)?
   - Can I anchor a homography from a small number of keypoints?

2. What variables can I control in v1?
   - Lighting: can I pick a venue with consistent illumination?
   - Camera: can I bolt-mount instead of tripod-mount?
   - Players/equipment: can I control re-identification signatures (uniforms)?

3. What's the trust bar for the user-visible decision?
   - Is this officiating-adjacent (needs 99%+) or analytics-adjacent
     (90% is fine if the user can audit)?
   - Can my wedge plausibly hit that bar?

4. What's the smallest end-to-end system that would actually be useful
   in the constrained wedge?

Output a wedge spec: venue, geometry, controlled variables, trust bar,
v1 scope. Anything broader than this gets cut.

Forces every variable into a control question. Variables you can't control in v1 are the ones the wedge has to eliminate by venue choice.

Run this on whatever you’re currently planning. If the answer is “I’m already on the most-constrained version,” you’re probably right and you should ship it. If the answer surfaces a more-constrained venue you hadn’t considered, the wedge needs to move.

The test is also useful as an audit on a stuck project. When weeks of work haven’t produced a shippable v1, the question is usually not “what’s wrong with the model” — it’s “is the target environment exposing variables that should have been eliminated by the wedge.”

Annotated transcript

A real conversation walking from “I want to build CV for a sport” through the wedge decision, into homography, and ending on what doesn’t go away. Click any pinned annotation to expand the “why this turn matters” commentary.

Picking the CV wedge

Four operator turns, four agent turns. Result: a wedge framework, a working mental model of homography, and a list of the hard problems that don't get eliminated.

  1. operator

    I want to build computer vision for a sport. The instinct is to pick the one with the biggest audience. But every team I've watched try this gets stuck for a year. What am I missing?

  2. agent

    You're skipping the wedge question. The instinct to pick the biggest audience is a market question, but the engineering question runs in the opposite direction: what's the most geometrically constrained version of this problem you can win first?

    An open field is the hardest CV problem. A fixed playfield with rigid known geometry is the easiest, because you can pin it with homography — a clean mathematical map from camera pixels to a known real-world plane. The minute you have that map, you've deleted a whole class of variable and you can spend effort on the genuinely hard problems instead.

  3. operator

    Homography — translate that. What does it actually do?

  4. agent

    Pretend you're standing courtside with a camera. The image you capture is a distorted, perspective-warped 2D picture of a 3D scene. The court itself, though, is a known flat rectangle — say 28 meters by 15 meters for basketball. Its corners and key lines are in fixed positions.

    Homography is the math that takes the four (or more) corners of the court as they appear in your camera image and computes a single 3-by-3 matrix that maps any pixel on the court surface to its real-world (x, y) position in meters. Once you have the matrix, a player's pixel location becomes a coordinate on the court. A ball's pixel location becomes a coordinate on the court. Distance, speed, possession — all of it becomes arithmetic on real-world coordinates instead of arithmetic on pixels.

    This is the wedge: you've turned a hard 2D-to-3D inference problem into a one-time calibration step. Public examples — Umpire AI mapping basketball court keypoints to that standard 28-by-15 grid, a Roboflow soccer pipeline calibrating against 32 fixed pitch keypoints — both use the same move.

  5. operator

    Okay, but a chaotic open field has hard problems too — small objects, occlusion, trust. Doesn't fixed geometry just defer them?

  6. agent

    No — and this is the key point. Fixed geometry doesn't eliminate the hard problems. It just lets you spend effort on the ones that matter.

    Three hard problems don't go away:

    1. Small-object tracking — a ball is a handful of pixels. Detectors miss it. Trackers lose it across frames.
    2. Occlusion — players cross in front of each other, equipment obscures the action, the ball goes behind a body.
    3. The trust bar — a 96%-accurate scorekeeper is still a dispute generator. Officiating-adjacent products fail on trust before they fail on accuracy.

    Fixed geometry gives you the budget to actually work on those. Open-field projects spend that budget on calibration drift, perspective inference, and lighting — and never get to the real problems. NurivaTech's public reasoning for starting with baseball captured this exactly: "its technical demands make it the ultimate test, and mastering it means we have the foundation for every sport." Start narrow on purpose.

  7. operator

    What about lighting? You said it doesn't go away. Why is it so hard?

  8. agent

    Because computer vision relies on visual consistency, and lighting breaks consistency in ways that are hard to anticipate. The padel-AI case is the textbook example — glass walls produced "ghost detections" from reflections; glare broke the models entirely. Add a player who takes off a shirt mid-match and your re-identification module fails because the visual signature it was tracking just changed.

    Calibration drift is the sibling problem. The homography matrix is only valid as long as the camera doesn't move. A bumped tripod, a temperature swing that warps a mount — and suddenly the court coordinates are wrong by half a meter and every downstream calculation inherits the error.

    The wedge answer to both is: pick a venue where you can control lighting and mount stability for v1. Indoor before outdoor. Bolted mount before tripod. Even fluorescent before mixed-source. Demonstrate the system works in controlled conditions, then generalize the lighting envelope as a second project — not as a v1 requirement.

What doesn’t go away — the honest hard parts

Picking a constrained wedge eliminates the spatial-inference problem. It doesn’t eliminate the rest. These are the problems that remain even after the homography is solid, and they’re the ones you actually want to be spending your time on.

  • Small-object tracking. A ball is often a handful of pixels. Detectors miss it. Trackers lose it across frames. The fix isn’t a better wedge — it’s better models, higher frame rates, and dedicated small-object pipelines.
  • Occlusion. Players cross in front of each other. Equipment hides the action. Even the best multi-object trackers fall back to confidence-based association when the visual evidence drops out. Real systems combine motion priors, depth cues, and re-identification to survive occlusion windows.
  • Lighting fragility. Reflections produce ghost detections. Glare blanks out detectors. Mixed-source lighting confuses color-based features. The wedge can control lighting in v1 by venue choice; it can’t make the underlying problem go away when you generalize.
  • Calibration drift. A bumped tripod, a temperature change that warps a mount, and your homography is wrong. Production systems detect drift by watching whether known landmarks stay in their expected pixel locations frame-to-frame, and they re-calibrate when they don’t.
  • The trust bar. For analytics products, 90% accuracy is fine — users see aggregate insights and can audit individual frames. For officiating-adjacent products, 96% is still a dispute generator: one wrong call in 25 is too many for a scorekeeper, line-caller, or judge. The product has to either hit a much higher bar or be explicit about its advisory role.

The wedge buys you the budget to work on these. It doesn’t solve them. A v1 that ships in a constrained venue and still wrestles with small-object tracking and occlusion is doing exactly the right work; a v1 that’s still wrestling with homography in month six is in the wrong wedge.

The wedge checklist

The decision rule, in checklist form. A wedge that fails any of these should be tightened before you start writing code.

  1. Is the playfield fixed and known? Standard dimensions, rigid boundaries, painted keypoints. If yes, homography is available. If no, you’re shipping perspective inference in v1 — that’s usually too much.
  2. Can you control lighting in v1? Indoor venue, consistent illumination, no glass walls or windows in the frame. Lighting variation is a generalization-phase problem, not a v1 problem.
  3. Can you control mount stability? Bolted mount or fixed rig, not a tripod that gets bumped. Calibration drift is the kind of problem that destroys trust quietly — easier to design out than to detect.
  4. Is the trust bar achievable? If the product is officiating-adjacent, the bar is high — set the threshold before you start and pick a wedge where you can plausibly hit it. If the product is analytics-adjacent, the bar is lower but you still need to set it explicitly.
  5. Is there at least one public precedent for your wedge shape? Not your exact product — your wedge shape. If no one has previously won by constraining geometry the way you are, suspect the wedge is wrong or the venue is too narrow to support a real product.
  6. Can you describe the generalization roadmap in three milestones? v1 (most constrained) → v2 (one constraint relaxed) → v3 (next constraint relaxed). If you can’t name the constraints you’ll relax, you don’t know what makes your wedge a wedge.

Six checks. None of them are about the model. All of them are about the environment you’re asking the model to operate in. That’s the wedge decision — and it dominates everything you do afterward.

Prompt kit

Three prompts for picking, auditing, and generalizing a CV wedge. Save in your project CLAUDE.md or a personal snippets file.

The most-constrained-version test
I want to build a computer-vision product in this domain: <describe>.

Walk through:
1. What's the most spatially constrained version of this problem?
   - Is there a fixed playfield with known geometry?
   - Are the boundaries rigid (painted lines, walls, fences)?
   - Can I anchor a homography from a small number of keypoints?

2. What variables can I control in v1?
   - Lighting: can I pick a venue with consistent illumination?
   - Camera: can I bolt-mount instead of tripod-mount?
   - Players/equipment: can I control re-identification signatures (uniforms)?

3. What's the trust bar for the user-visible decision?
   - Is this officiating-adjacent (needs 99%+) or analytics-adjacent
     (90% is fine if the user can audit)?
   - Can my wedge plausibly hit that bar?

4. What's the smallest end-to-end system that would actually be useful
   in the constrained wedge?

Output a wedge spec: venue, geometry, controlled variables, trust bar,
v1 scope. Anything broader than this gets cut.
Audit a CV project that's stuck
My CV project has been stuck for <N> weeks. The current target environment is: <describe>.

Walk through:
1. What variables in the current environment are causing the most rework?
   (lighting, geometry, occlusion, calibration drift, trust bar)
2. For each high-rework variable, is there a narrower venue or setup
   where I could control it for v1?
3. What would the system look like if I picked the most-constrained
   version of the same product and shipped that first?
4. What would I lose by narrowing — and would the loss matter to a
   v1 customer?

Output: either a "stay the course" recommendation with reasoning, or a
narrowed wedge proposal with the explicit trade-offs.
Plan the generalization path
I have a CV system that works in this constrained venue: <describe>.
I want to generalize to: <broader target>.

Walk through:
1. Which constraints did the v1 exploit? (geometry, lighting, mount, etc.)
2. For each constraint, what changes in the broader target?
3. Rank the constraint shifts by how much rework each implies.
4. Propose a sequence — which constraint to relax first, which second,
   which last.

Output a generalization roadmap with explicit milestones. Each milestone
is a venue or setup that relaxes exactly one constraint from the previous.

Apply this — pick your wedge

20-minute exercise. The wedge decision is cheap to make and expensive to defer. Do it before you write code, not after.

Pick your CV wedge

Each step takes 3-5 minutes. Progress saves automatically.

0/6
  1. 01Write down the broad version of the CV problem you want to solve in one sentence."I want to build CV for [sport / activity]." Don't optimize yet — just state the ambition.
  2. 02List every variable that the broad version exposes you to.Open geometry, variable lighting, mount instability, occlusion, re-identification drift, trust bar. Aim for 6-10 variables.
  3. 03For each variable, write down the most-constrained version you could control in v1.Indoor instead of outdoor. Bolted mount instead of tripod. Single uniform set instead of streetwear. Painted-line court instead of open field.
  4. 04Write the wedge spec — venue, geometry, controlled variables, trust bar, v1 scope.30-50 lines. If you can't write the wedge spec, you don't yet know what the wedge is.
  5. 05Sanity-check by finding two public examples that took the same wedge.Umpire AI on basketball, Roboflow on soccer, NurivaTech starting with baseball — every successful CV-for-sports project picked a constrained wedge first. If you can't find two prior examples of your wedge shape, suspect the wedge is wrong.
  6. 06Write the generalization roadmap — which constraint relaxes first after v1 ships.You're not generalizing yet, but knowing the next step keeps the v1 honest. "v1 indoor → v2 outdoor with controlled lighting → v3 outdoor uncontrolled."
Operating tier · what's next

After this lesson