Async Runs and Prompt Caching

Three phases of kodr about not making people wait, and not paying twice. The web channel sketch becomes a real control plane, and prompt caching arrives - carefully, because caching done wrong is just silent wrongness.

Phase 85: a control plane, not a blocking call

The phase-50 POST /turn blocked until the model finished and threw away all progress. Fine for tests, useless for a local model that thinks for three minutes. Phase 85 makes the surface task-shaped: submit a run, get a handle, watch events, fetch artifacts. POST /runs validates a strict field allowlist, maps each field onto the same typed options the CLI uses, and returns 202 with a runId, eventsUrl, and statusUrl before the model finishes. GET /runs/:id/events streams SSE built from the same onProgress events the CLI and TUI consume - with a replay buffer and Last-Event-ID so a reconnecting client still sees recent history. A new in-memory run-registry.mjs holds lifecycle transitions (queued → running → completed|failed|cancelled) and is deliberately not durable - artifacts under .kodr/runs stay the source of truth. One active run at a time by default, others queue with a recorded reason.

Two honesty notes I like. Cancel is honest: queued runs cancel outright, active runs only record cancelRequested with bestEffort: true, because threading an AbortSignal through model calls, tools, installers, and sandboxes is named follow-up work rather than a fake promise. And the HTTP layer stayed a thin adapter - the only new execution code is registry bookkeeping around the same run-turn channel request; the run itself is byte-for-byte the CLI path, dry-run default included. The phase even caught its own drift: the first cut passed install: true onto channel options and nothing happened, because the real field is installDependencies. A field-mapping test exists precisely to catch that silent HTTP-body-to-option mismatch.

Phase 86: cache what you can prove

Prompt caching, conservatively. The first implementation optimizes for correctness and inspectability over savings: kodr adds explicit cache control only where the model family is known to accept it, and otherwise just records cache usage when providers report it. In practice that means when cache mode is auto and the model id contains anthropic/, it adds root-level cache_control: { "type": "ephemeral" } - which neatly sidesteps the fragile problem of choosing a cache breakpoint inside OpenAI-style messages. OpenAI, DeepSeek, Gemini, Qwen, and Ollama-cloud are report-only for now - their request shapes differ enough that guessing at payload fields would be worse than waiting. Kodr normalizes cache counters (cachedTokens, cacheReadTokens, cacheWriteTokens) into summaries when providers return them, and only shows them when non-zero so local runs don’t sprout zero-valued cache fields. (A nice side-find: Ollama is no longer always local - :cloud model ids exist - so kodr stops auto-zeroing cost for those.)

Phase 87: keep the front of the prompt still

Caching only helps if the start of the prompt stays byte-identical between runs, and kodr’s system prompt mixed stable harness instructions with volatile workspace context - so a single changed source file could rewrite text near the beginning and miss the cache. Phase 87 has the context packer render four named sections in deterministic order:

stable: identity, safety rules, tool and envelope contract.
project: AGENTS.md.
semi-stable: memory and loaded skills.
volatile: file maps, packed source, inspection chunks.

For compatibility it’s still one system message (some local servers are stricter than hosted APIs, so multiple system messages would be a risky break) - but built from the sections in order, stable prefix first. Every run writes prompt-prefix.json with per-section hashes and char counts, and the tests prove the guarantee: changing a source file moves the volatile hash but not the stable or project ones. It doesn’t guarantee a provider cache hit - it makes the prefix measurable and stable by construction, ready for the day a model profile can safely opt into a smarter message layout.

Links:

Phase docs: 85, 86, 87
Kodr blog: 85, 86, 87
The run registry: src/run-registry.mjs and the server: src/server.mjs