Two phases of kodr that pair up nicely: one teaches the harness what a given model can actually handle, and the other uses that to stop a long session from drowning a small model.
Phase 69: model configuration is harness behaviour
Kodr’s model defaults were scattered across the CLI - the default LM Studio URL, the default Qwen model, the long local timeout, separate assumptions about tool support and JSON behaviour. Fine with one main local model; fragile the moment runs started mixing Qwen, Nemotron, OpenRouter planners, and model-specific context windows. Phase 69 gathers it into a model profile registry.
Built-in profiles cover the default qwen/qwen3.6-35b-a3b, nvidia/nemotron-3-nano-omni, and Ollama/OpenRouter wildcards; projects override via .kodr/model-profiles.json, or you point KODR_MODEL_PROFILES at a file. A profile records model id, provider, base URL, context window, completion reserve, timeout, native tool-call support, and the recommended response-envelope mode. Kodr attaches the active profile to run summaries and subagent metadata, so a failed run can be inspected with the same capability context that shaped it. Two defaults moved behind the profile: timeouts now come from the active profile unless --timeout-ms is set, and session compaction defaults derive from context window minus completion reserve. The change to packing stayed deliberately conservative - profiles can reduce the cap for small windows, but the full token-budget assembly was its own phase, so this one didn’t balloon into a context-packer rewrite.
The lesson worth keeping: model configuration is harness behaviour, not a cosmetic setting. Local models differ enough that context budget, timeout, tool support, and output expectations have to be explicit and artifacted - otherwise you’re debugging a failure without knowing the constraints it ran under.
Phase 70: compaction without lying
Session continuation sent the complete prior conversation back every turn - simple and faithful, and eventually unusable for a small local model. Phase 70 adds deterministic compaction. When a continued transcript exceeds the character budget, kodr keeps the frozen system prompt and the newest user-led turns, and replaces the older ones with an extractive summary pulled from existing artifacts: user intent, constraints, changed file paths, remaining tasks, verification failures, key tool output, decisions.
It uses characters, not claimed tokens - there’s no provider-neutral tokenizer, so the default is an honest 48,000-char budget that --session-context-chars makes explicit and testable. The artifact split is the careful bit:
conversation.json- what’s actually sent to the model.conversation-raw.json- the complete, untouched chain.session-summary.json- the extractive summary and compaction metadata.
Future continuations prefer the raw transcript, so kodr never progressively summarizes an already-summarized conversation - the lossy version is for the model, the full version stays for browsing and debugging. Two safety details I want to call out. The summary is injected as explicitly untrusted historical user context, never a system message - prior user, assistant, tool, and artifact text must not get promoted to higher instruction priority just because they got compacted. And kodr never truncates the current user turn to fit; if the frozen prompt plus the live request are themselves over budget, it records overflowChars so the breach is visible rather than silently chopping what you just asked.
Links:
- Phase docs: 69-model-profile-capability-registry, 70-session-compaction-and-summaries
- Kodr blog: 69, 70
- Model profiles: src/model-profiles.mjs and compaction: src/session-compaction.mjs