Context Dies, Documents Don't

# Context Dies, Documents Don't _What AI-assisted development actually requires._ --- I built a working software system yesterday. Eight interactive tabs, a [SQLite](https://sqlite.org/) database with ten tables, four API integration scripts, a player identity resolution layer that solves a known industry problem, and a governance framework that sustains the project across dozens of AI sessions without losing coherence. Roughly fourteen hours. I cannot write Python. That last sentence is the one people fixate on. It's the wrong sentence to fixate on. The interesting question is not "can a non-developer use AI to write code?" — that's been answered, and the answer is yes, obviously, with caveats. The interesting question is: **what does a non-developer need to be good at to direct a sustained build, and why is almost nobody talking about that?** I used [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), Anthropic's command-line tool for agentic coding, across eight sequential build sessions. The UI framework is [Textual](https://textual.textualize.io/) — a Python library for building terminal-based interfaces. The data comes from [football-data.org](https://www.football-data.org/)'s API. But the tools are not the story. The story is what happens between sessions. --- ## The One-Shot Illusion Most guides to building with AI describe a single session. You prompt well, the AI writes code, you get a thing. If the thing doesn't work, you debug together. If it does, you ship it. This describes maybe 5% of what building software actually involves. The other 95% is what happens when: - The session runs out of context and dies mid-documentation update. - A new AI instance picks up where the last left off but doesn't know what was decided, only what was written down. - You realize, three sessions in, that a design decision from session one was wrong — and the correction touches every file. - A debrief surfaces a bug that isn't urgent now but will be catastrophic after the next feature is shipped. - You're holding six unresolved architectural questions in your head while trying to decide which one to hand to the AI next. These are not prompting problems. These are **project management problems**, and they are the actual bottleneck in AI-assisted development. --- ## The Core Lesson: What Survives a Dead Session Here is the single most important thing I learned: **the AI has no memory between sessions.** Every new instance starts blank. The only continuity mechanism is what you've written into files that the next instance can read. This sounds obvious. It is not obvious in practice. In practice, you have a rich conversation with an AI about your project's architecture. You make decisions. You discuss tradeoffs. The AI understands your goals, your constraints, your domain. Then the session ends — context window full, or you close the tab, or the day is over — and all of that understanding evaporates. The next session, you start a new instance. It reads your code and maybe a README. It has no idea _why_ you chose SQLite over Postgres, why the coaches table exists as a separate entity rather than a column on the clubs table, or why you deliberately dropped all web scraping in favor of a single paid API. Those were strategic decisions. They lived in conversation. Now they live nowhere. I learned this the hard way. My first long session with Claude Code ran for hours. We made significant architectural changes, researched API capabilities, restructured the build plan. Then the context window filled. The AI tried to batch-update three documents at the end. It failed. The strategic decisions from that session — representing hours of deliberation — nearly died with it. **The fix is not better prompting. The fix is better documentation — written in real-time, as decisions are made, not batched at the end of a session.** The decision and its documentation must be the same act, not two separate steps. --- ## The Document Hierarchy I ended up with seven governance documents. That sounds like a lot. Each one serves a distinct function, and the relationships between them are explicit: ``` PROJECT_BIBLE (WHY — strategy, vision, constraints) ↓ SPEC (WHAT — schema, scripts, UI, data sources) ↓ BUILD_GUIDE (HOW — sequenced conversations, progress) ↓ References SPEC, doesn't restate it. CLAUDE.md (RULES — technical rules for the AI) ↓ README (EXTERNAL — describes current reality only) ``` Plus **UPDATE_FLOW** (governs how changes route between documents) and **TODO** (an inbox for items that need triage). The critical insight: **these documents have a hierarchy, and changes flow downward.** When the BIBLE changes (strategy shift), I check the SPEC. When the SPEC changes (new table, new script), I check the BUILD_GUIDE, CLAUDE.md, and README. This prevents the drift that kills projects — where documents contradict each other and the next AI instance doesn't know which to trust. If README contradicts SPEC, README is wrong. If SPEC contradicts BIBLE, the decision needs to be re-examined. This rule saved me multiple times. --- ## The Debrief: Feedback That Routes Itself At the end of every build session, I run a `/debrief` command — a custom prompt stored in the project that the AI executes on request.[^1] The AI answers five questions: 1. What technical rules should be added to prevent future mistakes? 2. What bugs or tech debt exist — with file and line number? 3. What architectural concerns should future sessions know about? 4. What went well — which patterns should we preserve? 5. What decisions were made — and which document does each belong in? Each category has a routing protocol. Category 1 goes straight into CLAUDE.md, no approval needed. Category 2 gets triaged: quick fix now, or logged with priority and what it blocks. Category 3 goes to the backlog if it's a future concern, or triggers a strategic conversation if it changes what we're building. Category 5 is the most important — it catches decisions that would otherwise evaporate when the session ends. One debrief surfaced that a football API doesn't return market values for players — contradicting what we'd assumed during architecture design. Without the routing protocol, that finding would have sat in a conversation log forever. With it, the finding got flagged for specification correction, documented in our API reference, and noted in the AI's rules so no future session writes code expecting that field. **The debrief is not a summary. It is a structured feedback loop with explicit routing.** [^1]: Claude Code supports [custom slash commands](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview) — markdown files stored in your project's `.claude/commands/` directory that the AI can execute on demand. `/debrief`, `/teach`, and `/audit-rules` are all custom commands I created for this project. --- ## Arbitration: What the AI Decides vs. What I Decide Not every update is equal. Some the AI can handle autonomously. Some need my judgment. **The AI applies autonomously:** technical rules (a CSS gotcha, a library quirk), bug fixes under five lines, marking tasks as complete. **The AI proposes, I approve:** any change to the specification, any new file not in the current scope, externally-visible documentation. **Requires deliberation with a separate strategic AI instance:** changes to the project's strategic direction, new database tables, dropping features, anything that affects how the project generates value. This is a governance model. I didn't set out to build one. I set out to build a football intelligence dashboard. The governance emerged because without it, the project kept drifting — decisions made in conversation that never reached the documents, documents contradicting each other, new AI sessions making assumptions that undid previous work. --- ## The Ignorance Advantage Something unexpected happened repeatedly during this build. I would ask, about some standard practice: **"Is this actually the best way to do this?"** And often, the answer was no — it was just the way everyone does it. The football analytics community handles player identification across data providers by matching on name and club. When a player transfers, the match breaks. Everyone knows this. Everyone hacks around it. I didn't know it was the standard, so I asked why football doesn't have universal player IDs the way the NBA does. When the answer was "it just doesn't," I designed an identity resolution layer from first principles: ```sql CREATE TABLE player_identities ( id INTEGER PRIMARY KEY, player_id INTEGER REFERENCES players(id) ON DELETE CASCADE, provider TEXT NOT NULL, -- 'football_data', 'opta', 'fm' external_id TEXT NOT NULL, -- the provider's ID confidence TEXT, -- 'confirmed', 'inferred', 'manual' UNIQUE(provider, external_id) ); ``` One table. One row per player per provider. Provider-agnostic by design. When a second data source arrives, the only new code is the matching script for that provider — the schema and lookup logic are already general. This same pattern played out with web scraping ("everyone scrapes FBref" — but a paid API is more reliable and the scraping target just lost its advanced data), with document structure ("just use a README" — but a README can't govern a multi-session build), and with file organization ("keep it in one big file" — but the AI writes better code when files are modular and under 400 lines). **Domain ignorance, combined with first-principles reasoning, is an asset.** The non-expert who asks "is this good?" often outperforms the expert who assumes "this is how it's done." I wrote about this pattern in football analysis — I called it [the Description Trap](https://www.ballerzbantz.com/p/the-description-trap), where the community describes what's happening without questioning whether it's the right approach. The same trap exists in software. This extended beyond architecture into process. Late in the build, my TODO had accumulated six small refactoring tasks — duplicated code across files, a hardcoded sort order, a name-matching function that would collide on common names once I imported more data. Conventional wisdom says: don't refactor early, features are more valuable than cleanliness, engineering time is expensive. I pushed back. Here's roughly how the exchange went: > **Me:** There seem to be a handful of refactors down the line. Conventionally these wouldn't be priority since engineering effort is constrained and features are more important. But this doesn't seem to be the case with deployable agents. > > **Claude:** Your intuition is correct. The traditional argument against early refactoring is that engineering time is expensive. With Claude Code, a refactoring session takes 15-20 minutes of your time. The "engineering effort" constraint that made refactoring expensive has collapsed. What remains is the cognitive cost and the risk cost. The question becomes: when is the risk of refactoring lower than the risk of not refactoring? The answer, we concluded, is: when the tech debt is actively multiplying. One of our duplicated code blocks had already caused a real bug — a league code that needed fixing in four files instead of one. The cost of the refactor was flat (one session). The cost of _not_ refactoring was compounding with every new script. The broader principle: **refactor early and often, because the cost of refactoring just dropped by an order of magnitude but the cost of accumulated debt didn't.** This is a genuine shift in how to prioritize engineering work, and I don't think the industry has internalized it yet. When your builder works for free, cleanliness stops being a luxury and starts being the default. --- ## What the Build Looks Like I share what I can without exposing proprietary content. The system is a terminal-based dashboard: ``` BB:INTEL — Football Intelligence System [PLAYERS] [CLUBS] [COACHES] [OPPORTUNITIES] [MATCHES] [MARKET] [NEWS] [FRAMEWORKS] ``` Eight tabs. Full CRUD operations on each. The PLAYERS tab has pagination (50 per page across 645 imported players), a league filter dropdown, a toggle to show only players I've personally assessed, and column visibility controls. Data comes from a paid football API. My own assessments — proprietary ratings, observations, framework connections — layer on top of the biographical data. The build progressed through a tracked sequence: ``` | Conversation | Status | What Was Built | |------------- |-------------|---------------------------------------------| | 1 | ✅ COMPLETE | Database layer, seed data | | 2 | ✅ COMPLETE | Dashboard core — Players + Clubs tabs | | 3 | ✅ COMPLETE | Remaining 5 tabs with full CRUD | | 4 | ✅ COMPLETE | API score fetching scripts | | 5 | ✅ COMPLETE | Schema migration + Coaches tab | | 6 | ✅ COMPLETE | Squad import + match detail scripts | | 7 | ✅ COMPLETE | Player tab pagination, filters, toggles | | 7.5 | ✅ COMPLETE | Player identity resolution layer | | 8 | ✅ COMPLETE | Housekeeping refactors | ``` Each conversation was a self-contained Claude Code session. Each produced tested, working code before the next began. The AI read the specification for context; the build guide told it which piece to work on; CLAUDE.md gave it the rules. I directed, reviewed, and made the judgment calls. The AI wrote the code. Here's a sample of what the debrief feedback looks like after a build session — the output that feeds the routing protocol: ``` 5. Decisions made Decision: marketValue not available from API — manual entry only. Belongs in: SPEC.md section 1.5 Decision: Belgian Pro League code is BJL, not BSA. Belongs in: SPEC.md section 1.8. Already fixed in config. Decision: Conversation 6 is complete. Belongs in: BUILD_GUIDE progress table. ``` Each decision states where it belongs. The routing protocol tells me whether to apply it directly or deliberate first. Nothing gets lost. --- ## The Real Skill I am a Philosophy and Physics student. I write essays about [football scouting](https://www.ballerzbantz.com/). Before this project, I had never completed a technical build of any kind. What I discovered is that the skill required to direct a sustained AI build is not programming. It is: **Specification.** Can you describe what you want precisely enough that an agent can build it without asking clarifying questions every thirty seconds? This is the same skill as writing a clear argument: premises, structure, defined terms, explicit scope boundaries. **Architectural thinking.** Can you see how components relate to each other and anticipate where a change in one place breaks something elsewhere? I caught a schema design issue — where a proposed solution solved the wrong problem — not because I can read Python, but because I understood what the data represented and how it would be used. **Process design.** Can you build the system that sustains the work, not just the work itself? The debrief protocol, the document hierarchy, the arbitration model — these allow the project to survive context death, session turnover, and the accumulation of small decisions that individually seem trivial but collectively determine whether the system coheres. **Quality control.** Can you test what was built and identify when the AI is wrong? I can't read the code line by line, but I can run the application, verify that data flows correctly, check that the API returns what the script claims, and push back when something feels off. These are not programming skills. They are thinking skills applied to a programming context. I suspect they're the same skills applied to any context where you're directing complex work without doing the manual execution yourself. --- ## What I Would Tell Someone Starting **Start with your knowledge, not with the code.** My first step wasn't "build a database." It was writing down everything I know about my domain and how I evaluate things within it. The code serves the knowledge. If you don't know what you know, you can't specify what to build. **Design for context death from day one.** Your AI will forget everything. Your documents won't. Invest in the documents early, even when it feels like overhead. The third session is when it pays off. **Run the debrief every session, no exceptions.** Even when the session went perfectly. Especially when it went perfectly — that's when you capture what worked so you can preserve it. **Don't batch your documentation updates.** The session that dies mid-update is the session where decisions get lost. **Question the standard.** When the AI tells you "the standard approach is X," ask whether X is actually good or just familiar. Your ignorance is an asset when paired with rigorous thinking. **Ship something ugly.** My dashboard runs in a terminal. It has no web interface, no charts, no design polish. It works. I use it. The temptation to keep building instead of using is the same as the temptation to keep preparing instead of doing. --- _This essay was written after a single day of building. The project is ongoing. I share it now, imperfect and in-progress, because the alternative is waiting until everything is done — and I have learned, through a longer and harder process than code, that "done" is a condition that never arrives on its own. You ship it, or it ships you._