Scratchpads That Survive, Stages That Verify


Two phases of kodr that are really about the same problem from two angles: a small local model can’t plan and execute a complex job in one shot, and it shouldn’t be trusted to declare that job finished. 57 gives it carry-over memory; 58 stops it from lying about being done.

Phase 57: a scratchpad that survives the run

The response envelope has had a scratchpad field since I designed it - the model’s private run-local notes. But kodr wrote it to scratchpad.md and forgot it. The next run started blank. Phase 57 adds --prior-scratchpad <path|last>, which reads that file and appends it to the user message:

do a task

## Prior scratchpad

{"plan":["step1","step2"],"done":["step1"],"next":"step2"}

The last alias reads the most recent run’s scratchpad via the .kodr/last-run pointer - no path bookkeeping. I deliberately chose context injection over a read_scratchpad() tool for the base case: the prior plan is always small and always relevant, a tool adds a round-trip the model has to remember to make, and tool mode is off by default anyway. A small model can’t miss context that’s already in the message; it can easily forget a tool it was never trained to expect.

A few small-model accommodations matter here: truncate at 2000 chars so a runaway scratchpad can’t crowd out the actual task, skip the section entirely when empty, and inject at the bottom of the user turn - small models attend to user content more reliably than to system-prompt additions. The system prompt suggests a {plan, done, next, notes} structure, but kodr never parses it - it’s a convention the model can follow or ignore. The whole point: a two-run plan-then-execute flow (run 1 writes the plan and no code, run 2 reads its own plan and writes the patches) becomes natural, and far cheaper than replaying a full session. The plan is the only state that needs to carry.

Phase 58: “done” is not the same as verified

This one came out of a Nemotron run that succeeded in the transport sense and failed in every sense that matters. The model produced one big response - a plan buried in reasoning, then fifteen full-file writes - and kodr marked it successful because the JSON was valid and the writes landed. The actual app was junk: one health test, a README referencing a missing migration, a test using an assertion API node:test doesn’t have. That exposed the gap between recording a task plan and using one to steer execution.

So phase 58 adds a staged path for complex work. Runs that look like service/API/database/dependency tasks now start with a plan-only turn, then proceed in small slices with a cap on touched paths and fresh context between stages. And it learned to distrust completion markers, in three rounds of getting fooled:

  1. The model returned a no-op implementation turn (files: []) while claiming the stage was complete. Kodr now feeds that back as corrective input: “no files changed, give me concrete files or patches.”
  2. The model used the staged loop properly and ended with STAGED_DONE - but no verification had run, and the generated tests used Jest globals instead of node:test. So a staged run that reaches STAGED_DONE without verification is now marked StagedUnverifiedError: the files still land under --yes, but the run is machine-visibly failed.

That last rule is the whole lesson, and it’s a principle the project keeps returning to: a local model can say it’s done; applied work is only complete when verification has run. Better an honest failure that the dependency-install and repair phases can pick up than a false success that ships broken code.

Links: