TPL-2026-011·preprint·2026-04-30

Long-Context vs Targeted Retrieval for Compliance Drafting: An SBA Loan Packet Case Study

TruPath Labs Research · TruPath Ventures · Stanley, NC

mhgsbacompliancelong-contextretrievalhallucination

Abstract

Modern long-context language models (1M-token windows) make it tempting to load an entire reference corpus and draft against it; the implicit premise is that more context is always better when accuracy matters. We test this premise on a class of work where accuracy carries real consequences: SBA 7(a) loan packet drafting for Mile High Golf, a pre-launch entertainment venue. We pair-draft 12 SBA-style packet sections (use-of-funds narrative, projections justification, market analysis, owner resume, etc.) under two conditions — (A) full 1M-token context load of the SBA SOP, recent guidance, and prior-year MHG planning materials, and (B) targeted retrieval (scoped reads of the specific SOP sections relevant to the section being drafted). Drafts were evaluated by a human reviewer with SBA-packet experience using a 5-class hallucination taxonomy. Long-context drafts produced 2.8× more total hallucinations per packet section than targeted-retrieval drafts (mean 4.2 vs 1.5; n=12 paired sections). The class breakdown is informative: long-context drafts hallucinate <em>plausible-but-fabricated</em> regulatory citations at 6× the rate, suggesting the failure mode is haystack pollution rather than a missing fact. Operator time per section is comparable across conditions; long-context appears <em>cheaper</em> by token cost but is more expensive by reviewer-correction time. We argue that for regulatory drafting, scoped retrieval is the load-bearing primitive and long-context is a substitution test that adds risk without measurable reward. Mile High Golf is pre-launch, so no actual SBA packet has yet been filed; all drafts are illustrative, and the Limitations section names this and other honesty items.

1. Background

1.1 Long-context as a substitute for retrieval

The 1M-token context windows now available on frontier models [7] create an operationally tempting substitution: instead of designing a retrieval pipeline that pulls only the relevant slices of a corpus into the prompt, an operator can simply load the entire corpus and trust the model to find what it needs. The substitution is appealing because it eliminates an engineering step (retrieval system design, embedding maintenance, chunk boundaries) and replaces it with a single prompt that the operator can audit by reading.

The literature on long-context behavior offers reasons for skepticism. Liu et al. [1] documented the “lost in the middle” pattern: model performance on retrieval-from-context degrades substantially when the relevant fact is in the middle of a long context, even within the model’s formal context window. The retrieval-augmented-generation literature [2] [4] argues that scoped retrieval is the load-bearing primitive precisely because it sidesteps the haystack problem. Surveys of hallucination in NLP [3] classify haystack pollution as a distinct failure mode from out-of-distribution content.

1.2 The MHG / SBA setting

Mile High Golf is a pre-launch entertainment venue in the TruPath portfolio. The financing path includes an SBA 7(a) loan, currently being repointed to the new flagship site (7521 Eastern Medical Dr, Denver NC) following the 2026-04-28 site decision [10]. SBA 7(a) loan packets are heavily structured documents governed by SOP 50 10 7.1 [5] and the 7(a) program guide [6]. Each packet section (use-of-funds narrative, projections justification, market analysis, owner resume, etc.) must conform to specific format and content requirements; a hallucinated or stale citation is not just sloppy, it is a real procedural risk.

This setting is therefore a useful case study for the long-context-vs-retrieval question. The reference corpus is well-defined (a small set of SBA documents plus prior MHG planning materials). The accuracy stakes are real. And the operator (CEO) has the domain knowledge to evaluate hallucinations, which is the bottleneck for this kind of study in most other regulatory domains.

Subscribers only · continued

The rest of TPL-2026-011 is for subscribers.

Long-Context vs Targeted Retrieval for Compliance Drafting: An SBA Loan Packet Case Study

Every Expert-tier lesson — diagnostic prompts, transcripts, prompt kits, full homework
Every research paper — methodology, figures, tables, reproducibility appendices
New Expert lessons + papers as they ship (quarterly cadence)
Foundations + Operating lessons stay free; bundles on GitHub stay free; this tier is the deep stuff

Become a subscriber — free →Already a subscriber? Sign in

Free while the early catalog ships. Paid tier comes later — subscribe now and you’re grandfathered in.