Last time I made the case that generating example apps isn’t an eval - it’s a way to exercise kodr until something breaks. Phases 21 and 23 are exactly that, twice more, and both broke it in instructive ways. Phase 22 is the odd one out: a permission policy gate that follows straight on from the hooks layer.
App two: a Markdown blog (phase 21)
The second example is a Markdown blog generator - frontmatter, escaping, sorting, static HTML out. A clear step up from the todo CLI.
The local model failed to generate it. Twice. The first dry run died with fetch failed and left nothing useful behind - no context, no prompt, no run evidence at all. That’s the bug. A model run that falls over should still leave a paper trail, otherwise you can’t tell a flaky network from a bad prompt from a model that simply gave up.
So kodr now writes failure artifacts even when the model run itself fails: context.md, prompt.md, summary.json, an error.json, and empty response/write/test files so the run directory has a consistent shape. The retry failed too - but this time it failed legibly, with a full artifact directory I could go read. I finished the blog generator by hand so the repo still gets the fixture, but the win was the closed gap, not the app.
App three: a notes API (phase 23)
The third example is a small HTTP notes API - routing, JSON parsing, validation, status codes, persistence, real integration tests. More moving parts again.
This time the model produced a valid proposal. It just only wrote package.json. And kodr reported the run as verified, because the verification command was node --test, and node --test against a project with zero tests finds nothing, runs nothing, and exits 0. Green on an empty suite. That’s the worst kind of pass: confident and meaningless.
The verification runner now treats a Node test run that reports tests 0 as a failure, while leaving plain node --check behaviour alone. An example with no tests is no longer allowed to claim it passed. The finished notes API lives under examples/notes-api with actual integration tests that make real HTTP requests.
Two apps, two harness bugs that only surfaced because I tried to build something real. That’s the whole reason these example trials exist.
The policy gate (phase 22)
Phase 22 comes from a different angle. Hooks gave kodr a deterministic lifecycle boundary - code that runs around every tool call. Permission policy is the first real consumer of that boundary: a policy object that makes allow/deny decisions before tool effects happen.
The important part is that it changes none of the defaults. File writes are still jailed by the safe-write path logic. Verification commands still go through the allowlist. Public fetches still get private-address blocking. Writes still dry-run unless you explicitly apply. The policy gate only lets you narrow that - deny reads, writes, apply, commands, network, or specific hosts - and it sits in front of the existing hardening, not in place of it.
It lands in ToolRunner, the same chokepoint everything else converges on. There’s no CLI or project-file configuration for it yet; that’s later. For now the contract exists and is tested, which is the order I like: build the boring, certain layer first, expose the knobs once the shape is proven.
That’s the through-line across all three phases, really. Let the model try to build things and watch where the harness lets you down. Then make the harness deterministically harder to fool.
Links:
- Phase docs: 21-markdown-blog-example.md, 22-permission-policy.md, 23-notes-api-example.md
- Kodr posts: blog/21, blog/22, blog/23
- Failure artifacts on a failed run: src/app.mjs
- The empty-suite fix: src/verification-runner.mjs
- The policy gate: src/permission-policy.mjs