Signer-Dialect Robustness in Landmark-Only Sign Recognition: A 21-Fold Leave-One-Signer-Out Study
Abstract
A single cross-signer accuracy number hides who a sign-recognition model actually serves. We run a 21-fold leave-one-signer-out study on the Google ISLR 250-sign dataset, holding out each of the 21 signers in turn and reporting the full per-signer distribution rather than a pooled average. Top-1 accuracy ranges from 25.6% on the worst-served signer to 64.2% on the best — a 38.6 percentage-point spread — with a mean of 41.7% and a per-signer standard deviation of 11.4 points, more than ten times the seed-to-seed deviation of the same architecture. Five of twenty-one signers fall below 30%. We pre-registered three hypotheses: that the spread would exceed 25 points (confirmed, 38.6), that the worst signer would fall below 30% (confirmed, 25.6%), and that handshape complexity would predict per-signer degradation (refuted, r-squared 0.008). The degradation is signer-driven, not sign-feature-driven: the same sign swings 60 to 82 points between the best and worst signers. We argue the per-signer distribution, not the pooled mean, is the honest unit of report for sign recognition, and that signer-level metadata must be captured in any future data collection.

1. Introduction
A companion study established that the honest ceiling for landmark-only isolated-sign recognition, measured on held-out signers, is about 45% top-1 rather than the 80 to 90% reported on signer-mixed splits [7]. That number is a pooled average across a held-out test set. This study asks the question the average hides: when the model meets a signer it has never seen, how much does performance depend on which signer it is?
The answer is: enormously. Running the model against each of the 21 signers in turn, top-1 accuracy ranges from 25.6% on the worst-served signer to 64.2% on the best. The mean of 41.7% describes almost none of them. For a technology whose entire purpose is to work for deaf people in the wild, a 38-point spread across signers is not a footnote to the accuracy number. It is the more important number.
We pre-registered three hypotheses before running the folds, and the one that failed is the most informative. We expected handshape complexity to predict which signers the model struggles with. It does not. The variable that predicts whether a sign is recognized is who is signing it, not how hard the handshape is.
The rest of TPL-2026-024 is for subscribers.
Signer-Dialect Robustness in Landmark-Only Sign Recognition: A 21-Fold Leave-One-Signer-Out Study
- Every Expert-tier lesson — diagnostic prompts, transcripts, prompt kits, full homework
- Every research paper — methodology, figures, tables, reproducibility appendices
- New Expert lessons + papers as they ship (quarterly cadence)
- Foundations + Operating lessons stay free; bundles on GitHub stay free; this tier is the deep stuff
Free while the early catalog ships. Paid tier comes later — subscribe now and you’re grandfathered in.