Labs / Playbook / token-discipline
Discipline● live

Token Discipline

Context window hygiene. What belongs in CLAUDE.md vs memory vs code. The compaction strategy that keeps sessions fast after month 3.

Token Discipline
  • CLAUDE.md size target: under 200 lines — if it's longer, information leaked in that belongs elsewhere
  • The load-everything vs. read-on-demand decision, with the pointer pattern that solves it
GitHub publication pending

By week six of running four projects through a single Claude Code workflow, my sessions were slow. Not slow in a "the model is thinking" way — slow in a "we're re-establishing context we've established a dozen times" way.

The root cause was that I had treated CLAUDE.md as a place to put everything. The file had grown to 400 lines. Every session started by loading 400 lines of context. Most of it was irrelevant to what I was actually doing that session. When I was debugging a camera driver, the 80-line section on SBA loan status was noise. When I was reviewing a lease amendment, the 60-line CV pipeline reference was noise.

Token discipline is the practice of putting the right information in the right place — so sessions load what they need and nothing they don't.

Where information belongs

CLAUDE.md holds standing operational context: project identity, people, agents, rules, active work pointers. Under 200 lines. Relevant to most sessions.

memory/ holds depth: contact directories, project state, feedback history, external system pointers. Relevant to specific sessions. Loaded on demand, not at session start.

The code holds code patterns, architecture, dependency versions. The agent reads these by reading files. Don't duplicate them in CLAUDE.md.

Git holds history. git log, git blame. Don't summarize recent commits in CLAUDE.md — the agent can read the history directly.

Playbook/wiki holds reference material. If the agent needs to understand a methodology, point it to the reference page, don't paste the content inline.

The test: if the agent can find the information in under five seconds by reading a file or running a command, don't pre-load it. Pre-loading is for things that don't exist in any file.

The pointer pattern

The single most impactful token-discipline move: replace inline content with pointers.

Instead of pasting the full agent roster with descriptions, activation triggers, and authority levels:

markdown ## Agents See 05-Agents/README.md — roster, activation triggers, hierarchy. Default: APEX. Activate specialists by name.

The agent reads the pointer, loads the roster file when relevant, and the CLAUDE.md stays short. Same information, fraction of the session-start cost.

The pointer pattern works for: agent definitions, decision history, financial models, contact directories, architecture documentation, full glossaries, sprint contract templates. Anything that's larger than 10 lines and not needed in most sessions.

The compaction problem

Claude Code auto-compacts long conversations. When it does, prior messages are summarized and the summary replaces the raw transcript. This is usually helpful — but it means detailed content from early in the session may be compressed.

Work with compaction: write decisions and conclusions to files as they're made. Files survive compaction; chat history doesn't. If a sprint contract, postmortem, or handoff artifact exists as a file, reference the file — don't paste the content. At session start, re-establish the most critical context explicitly rather than assuming it survived.

Signs your context is bloated

The re-explanation tax is back — you're telling the agent things it should know. The CLAUDE.md is over 200 lines. Sessions feel slower past hour two. The agent asks clarifying questions you answered yesterday.

Each of these is diagnosable. The CLAUDE.md audit is the first pass: read every line and ask whether it was relevant to what you actually did last week. Lines that weren't relevant are candidates for the pointer pattern or deletion.

After two months of consistent token discipline, the average session in my workflow loads under 300 total lines of persistent context — CLAUDE.md plus the three or four memory files that are actually relevant. That's the target. The sessions that don't feel slow anymore.

— Michael, from the lab