Postmortem2026-05-177 min

ESP32 firmware with Claude: the gap between 'it compiles' and 'it works on the bench'

Claude is excellent at writing ESP32 firmware that compiles. It is not reliable at predicting what that firmware will do when the hardware is actually in front of you. Three incidents and the gate I added.

We're using an ESP32 in the Quantum Caddy smart board. It handles sensor reads, local state, and WiFi communication to the Mac Studio inference pipeline. AXIOM writes the firmware. Claude Code writes the firmware. It compiles clean, the logic is sound, and then you put it on the actual hardware and find out what the simulation missed.

Three incidents from the past six weeks. All three followed the same pattern.

Incident one: interrupt timing and RF noise

The ESP32 firmware had interrupt-driven sensor reads. The timing was correct by the datasheet. Under bench conditions with no WiFi active, it worked perfectly. Under real conditions, with WiFi running alongside the sensor reads, the interrupt handler occasionally fired during a WiFi stack operation and caused sensor data dropouts. Not a crash. A silent miss: the sensor fired, the interrupt missed it, the board showed no event.

Claude's firmware was theoretically correct. The issue was the interaction between the WiFi stack's interrupt priority and the sensor interrupt. That interaction is real hardware behavior that doesn't show up in simulation. The fix required moving the sensor reads to a dedicated FreeRTOS task with a priority higher than the WiFi stack's default, and adding a circular buffer to catch events that arrived during brief unavailability windows.

Claude helped with the fix once I described the symptom. It couldn't have predicted the symptom from the code.

ESP32 firmware with Claude: the gap between 'it compiles' and 'it works on the bench' — slide

Incident two: WiFi reconnect at the 47-minute mark

The WiFi reconnect logic passed every simulated test. The reconnect function was called on disconnect events, it re-established the connection, life was good. In real deployment, the board ran for 47 minutes and then hung. Not a crash. The WiFi stack was in an intermediate state after the access point had rebooted, and the reconnect function had a silent assumption that the AP would be present and responsive when called. The AP wasn't. The function returned without connecting and without triggering another retry.

The fix was a watchdog pattern: a background task that checks connection state every 30 seconds and triggers a full reconnect sequence (not just the reconnect function) if the connection is absent. Including a backoff schedule for repeated failures.

The 47-minute mark wasn't magic. It was approximately how long a real deployment runs before someone in the building power-cycles the router.

Incident three: thermal failure in the board enclosure

Power state transitions worked correctly at room temperature. The board runs several subsystems: sensor reads, LED state, WiFi, SD card logging. The power transition logic was designed to reduce draw during low-activity periods. At room temperature, clean. At the temperatures inside the closed board enclosure during a real session, the timing on one power state transition was unreliable. The issue was temperature-dependent propagation delay on the power rail that changed the effective timing of the state transition.

This is the class of failure that is hardest to predict from code. The firmware was correct for the hardware at 25 degrees C. The deployed hardware operates at 35-40 degrees C after 30 minutes of use.

ESP32 firmware with Claude: the gap between 'it compiles' and 'it works on the bench' — slide

The gate I added

After incident three I added a mandatory hardware-in-loop gate. Before any firmware is declared done: it must run for 4 hours on the actual board, in the actual enclosure, with WiFi active, with the board in realistic deployment position. Not on the bench with the lid off. On the board, closed, under real thermal and RF conditions.

This gate catches the incidents above because they only manifest under real conditions. It adds a day to any firmware sprint. It has paid for itself twice already.

The rule: never declare firmware done until it has passed 4 hours on the actual hardware. Compiles-and-passes-simulated-tests is the starting gate, not the finish line.