Subagent Orchestration: Divide Judgment, Not Work

Way back I wrote about subagents starting small - a bounded child context with a regex inside it, and a promise that the model-backed version was coming. This is that version. Five phases turn the planner/implementer/reviewer idea into a real pipeline of isolated kodr conversations, and the whole arc converges on one lesson: subagents should divide judgment, not duplicate deterministic work.

Phase 80: three conversations

Staged execution split complex work into a plan turn and bounded implementation turns. Phase 80 makes planner, implementer, and reviewer separate model conversations. The planner explores and produces a concise plan; the implementer receives that plan and emits the normal proposal; the reviewer receives plan plus proposed writes and returns a pass/fail without applying anything. Each prompt opens with the same roster, so you can target a stage with a plain prefix like reviewer: run tests after - kodr strips the directive from the shared prompt and injects it only into that agent’s turn. Deliberately one pass for now: reviewer failures surface in artifacts but don’t yet loop back into a repair agent.

Phase 81: say something

The first Nemotron subagent run exposed an obvious gap - the terminal sat silent while the local model ground away, which feels broken even when it’s working. So structured progress events: the TUI prints grey “planner started / implementer finished / reviewer finished” lines, and because they’re channel events, a future web UI consumes the same stream. A test then caught that the plain CLI path had no renderer attached, so non-JSON runs now print the same feed to stderr as compact info: lines while keeping JSON output clean. One boundary stays firm: kodr surfaces visible artifacts (the plan, the reviewer summary, proposal messages, progress) and never hidden model reasoning. This phase also added AgentStart/SubagentStart hooks that fire before the model call - useful for logging or a policy block before any tokens burn.

Phase 82: the right model for the job

Planner, implementer, and reviewer do different work, so why force them onto one model? Phase 82 adds per-agent specs:

kodr run -p "task" --subagent-stages \
  --model lmstudio/qwen/qwen3.6-35b-a3b \
  --agent-model planner=openrouter/anthropic/claude-opus \
  --agent-model reviewer=lmstudio/nvidia/nemotron-3-nano-omni

The spec parser splits only on the first slash, so provider routing is unambiguous while provider-native model ids keep their own slashes. Existing --model/--openrouter usage stays compatible. A run that supplies --agent-model without --subagent-stages now warns the overrides are inactive. And a Nemotron run surfaced the next boundary - the OpenRouter planner did good work but the local implementer returned valid JSON in the wrong schema - so kodr now sends structured-output response_format schemas for proposal and review turns, giving local models server-side shape pressure instead of prompt-only pleading. (Plus an opt-in --max-thinking-tokens for reasoning models that otherwise stall.)

Phase 83: stop making the reviewer redo everything

The first all-OpenRouter run was damning: a three-file task burned 35,005 tokens, over half in the reviewer - which received the full plan and proposal in both its system and user messages, reread every generated file, and ran the tests itself. It also failed for the wrong reason (npm test, no package.json) while the top-level summary said tested: false, because orchestration had bypassed the normal install/verify pipeline. Phase 83 fixes the shape: the implementer gets the plan once, kodr applies writes and runs install + verification before review, and the reviewer receives a compact write manifest and verification evidence instead of full file contents, with read-only inspection tools. The lesson in one line: planning and review benefit from separate model contexts; applying files, installing packages, and running tests belong to the harness. Don’t pay a model to redo what code does deterministically.

Phase 89: inherit the actual contract

A subtle mismatch lurked the whole time: the API request included tool schemas, but the subagent system prompt didn’t inherit kodr’s standard harness preamble. So stages could call tools, but their system messages only described the orchestration role - missing the shared identity, the untrusted-input warning, the proposal envelope, AGENTS.md and memory handling, and exact tool-name discipline. Phase 89 splits the reusable core prompt from workspace packing: subagents now get the core contract first, then the roster, then a generated “Available Tools” section naming the exact tools for that stage (and warning against invented names like read or write_file), then the role prompt. Bulky workspace context stays in the user message. The takeaway: tool schemas aren’t enough - the model needs an accurate contract in the same place as its highest-priority instructions, matching the tools the harness actually registered.

Links:

Phase docs: 80, 81, 82, 83, 89
The orchestrator: src/orchestration.mjs and per-agent specs: src/model-specs.mjs