Field Notes

One prompt clause turned my analysis agent into an experimentalist.

Most analysis sessions return confident prose. This one returned reproduction scripts, error bars, and a ranked list of experiments. The difference was a one-page brief written like a lab contract.

2026-07-21 · 6 min

I asked Claude to audit my simulator. It killed every high score.

We had a tens-of-millions-of-throws simulator and an overnight Monte-Carlo run nearly three times larger. One Saturday-night physics audit showed the two models agree on every trajectory and disagree on half the outcomes — and every max-score throw was an artifact.

2026-07-09 · 8 min

I called three sign-recognition models failures. The recipe was the failure.

A Parley notebook reported three landmark architectures as broken on cross-signer ASL. A warmup and a gradient clip brought all three back, and two matched the best model. The ranking had measured my training recipe, not the models.

2026-06-08 · 7 min

Recipe beats architecture: lottery tickets in sign models

We trained seven landmark architectures three times each. Three of them worked on one seed and collapsed to near-random on the others. A single-seed comparison would have called two of those collapses a result.

2026-05-31 · 6 min

The 38-point gap: one accuracy number, twenty-one very different users

Our sign model averages 42% across signers. That average hides a range from 26% to 64% — and the thing that decides where a person lands is not the signs they make, it is who they are.

2026-05-31 · 7 min

45%, not 90%: the only sign-recognition number I trust

Our best landmark-only sign model scores 45% on signers it has never seen. The field routinely reports numbers twice that high. The lower number is the honest one, and it is the one we publish.

2026-05-31 · 6 min

Six ways hearing-built sign-language AI fails the Deaf community

I keep a running catalog of how hearing-led sign-language AI fails. It is not a list of other people's sins. It exists so Parley can catch itself the moment it starts to look like one of them.

2026-05-31 · 7 min

Picking the glasses

One AR-glasses decision for two ventures. The criteria that survived the cut, and why doing it once was the right move.

2026-05-25 · 7 min

Why I'm running Parley

I started a Kaggle research project in Q2 2026 while running two startups. The decompression channel, the four open questions, and what makes it survive.

2026-05-25 · 8 min

ESP32 firmware with Claude: the gap between 'it compiles' and 'it works on the bench'

Claude is excellent at writing ESP32 firmware that compiles. It is not reliable at predicting what that firmware will do when the hardware is actually in front of you. Three incidents and the gate I added.

2026-05-17 · 7 min

31 million systematic throws or 91 million random ones. Which one taught us more.

The answer isn't obvious. Systematic grid search gives you coverage. Random sampling gives you reality. You need both, and you need to know what each one is telling you.

2026-05-17 · 6 min

We ran 124 million simulated cornhole throws. Here's what it cost and what we got.

A parametric physics engine, 8 parameters, two overnight runs. The database exists. Here's the honest accounting of what building it took and what we learned that we couldn't have learned any other way.

What it actually takes to build an AR overlay on a physical object in real time.

AR on physical objects is 80% coordinate system problems. Claude is great at helping you think through the geometry. You still have to understand it yourself.

Training a custom CV model with Claude: the data quality lesson we learned the hard way.

Clean data beats model size. Every time. Don't upgrade the model until you've audited the labels.

2026-05-17 · 7 min

We put a language model inside a hardware device. Here's every decision we made.

LLMs in real-time hardware aren't ChatGPT. Latency budget is the constraint that changes everything.

Claude built our CV pipeline. Then it lied about being done.

Agents are unreliable judges of their own work. Here's how a structural fix — not a smarter model — stopped QC's CV pipeline from shipping silent failures.

I pasted my session tokens into a chat. Here's the gate I built.

Sixteen cookies that together are my whole Google account, dropped into a chat window — while building a security playbook. The how is the whole point.

2026-05-14 · 5 min

Computer vision already runs elite sports. It's about to run the rec league too.

Twelve-camera tracking rigs and Hawk-Eye are infrastructure at the top. The same capability now fits on a $249 board and a commodity camera. What that unlocks across every sport — and why the smartest way in is the narrowest one.

2026-05-14 · 7 min

Playbook Update

I run three ventures from one Obsidian vault. Here's the 13-folder template.

Why a markdown vault outperforms a Notion plus Asana plus Drive plus Slack stack when AI agents are part of the work. Battle-tested across QC, MHG, and Parley.

2026-07-07 · 8 min

Playbook Update

Why I write a postmortem for every meaningful incident, and the template that makes it take 45 minutes.

Twenty-six postmortems across QC and Parley in six weeks. The template, four worked examples, and the discipline that makes the same mistake stop happening twice.

2026-06-23 · 7 min

Playbook Update

I shipped a Sprint Contract template. Here's why my AI agents kept declaring done when they weren't.

A 48-contract system born from agents that praised their own work. The fix wasn't a smarter model. It was structural.

2026-06-09 · 7 min

I'm starting a publication. Here's what it is and why.

Field notes from an operator running three ventures on Claude Code. Biweekly. No theory. Receipts only.

2026-05-26 · 5 min

How I run six AI agents across three ventures without them stepping on each other.

The routing model that holds up under load. Built for TruPath, tested on Mile High Golf, Quantum Caddy, and Parley.

2026-05-12 · 6 min