A 7-Class Failure Taxonomy for ASR-Glasses Coordination in AR Research Sprints
Abstract
Parley is an AR-glasses product for bidirectional deaf/hearing conversation; the research arm is in Phase-0/1 Kaggle-published exploration and the consumer hardware (Everysight Maverick AI) has not yet shipped. In this gap between research-coding and shipped-hardware, LLM agents must coordinate two semi-independent subsystems — automatic speech recognition (ASR, whisper-style transcription) and a glasses-render simulator that stands in for the not-yet-shipped HUD. We collect 119 agent-driven coordination attempts across 28 research sprints from 2026-02 through 2026-04 and classify failures into a 7-class taxonomy: timing-misalignment, schema-drift, latency-budget-overrun, render-format-mismatch, transcription-confidence-bypass, simulator-vs-hardware-divergence, and operator-context-leak. Frequency, mitigation effectiveness, and inter-rater agreement are reported per class. Three classes (timing-misalignment, schema-drift, render-format-mismatch) account for 71% of observed failures; mitigations applied as pre-flight gates outperform discipline-only mitigations 4.6× on subsequent recurrence. Because hardware has not shipped, all reported failures are simulation-mediated; the taxonomy will be re-validated against real-hardware sprints in Phase 4 and onward.
1. Introduction
Parley is a category creator: ambient transcription, an AR-glasses product where two wearers — one deaf, one hearing — each see the other’s expression rendered as text in their own HUD, in real time, without either party adapting. The research arm has shipped three Kaggle notebooks (Notebook 00, 01, and 02 [10]) and is in Phase 1/2 of a 9-phase product arc. The consumer hardware target — Everysight Maverick AI [6] — has not yet shipped. This is a structural feature of the project, not an accident: the research arm exists to build domain understanding before any hardware arrives.
Throughout this hardware-pre-shipment phase, agent-mediated coding work in the Parley repository must coordinate two semi-independent subsystems. The first is automatic speech recognition (ASR) — a Whisper-style transcription path [1] that turns speech into time-stamped text with confidence scores. The second is a glasses-render simulator that stands in for the not-yet-shipped HUD and consumes ASR output to produce a rendered frame. The two sides have separate contracts, separate timing assumptions, and separate failure modes. A research sprint that touches both surfaces is exactly the kind of cross-system, multi-file work where coordination failures are most expensive.
We observed during 2026-Q1 that these failures clustered. The same kinds of bugs recurred across sprints, framed differently each time. A taxonomy was needed both for postmortem rigor (so a recurrence can be flagged as such [9]) and for mitigation pattern selection (some classes respond well to schema validation; others demand a runtime gate). This paper presents the taxonomy with frequencies, a confusion-matrix evaluation of the agent’s ability to self-classify, and per-class mitigation effectiveness. Because hardware has not shipped, all reported failures are simulation-mediated; this is disclosed throughout and is the largest threat to external validity.
The rest of TPL-2026-010 is for subscribers.
A 7-Class Failure Taxonomy for ASR-Glasses Coordination in AR Research Sprints
- Every Expert-tier lesson — diagnostic prompts, transcripts, prompt kits, full homework
- Every research paper — methodology, figures, tables, reproducibility appendices
- New Expert lessons + papers as they ship (quarterly cadence)
- Foundations + Operating lessons stay free; bundles on GitHub stay free; this tier is the deep stuff
Free while the early catalog ships. Paid tier comes later — subscribe now and you’re grandfathered in.