Do coding agents need an OS-like control plane? I built a prototype and want critique.
I’ve been experimenting with a local control-plane for coding agents, and I’d love serious critique from people building real agent workflows.
The problem I kept running into:
- agents forget the original project intent after long sessions
- “done” is often claimed without eval/test/postflight evidence
- MCP/tool/subagent calls are invisible unless you manually inspect logs
- old projects accumulate stale generated files, broken hooks, and mismatched state
- multi-agent work gets messy because there is no durable task/spec/lifecycle record
So I built a prototype called KnowledgeOS.
The idea is not to replace the operating system. It is more like a project-local governance layer for agents.
Current pieces:
- `.agent-os/` control plane per project
- `create-task` for formal task intake
- `create-spec` / `align-spec` so runs bind to durable user intent
- `route-task` and `check-route-write` to prevent uncontrolled file mutation
- `context-pack` and `plan-task` before execution
- mandatory lifecycle phases: route, plan, review, dispatch, execute, report
- visible `CHECKPOINT_OK`, `CAPABILITY_OK`, and `TRACE_OK` markers
- `capability-event` for MCP / skill / subagent / shell / script visibility
- `eval-task`, `verify-context`, `verify-lifecycle`, `complete-task`
- postflight hook that must return `[SYNC_OK]`
- local tool registry for MCPs, skills, orchestrators, and subagents
- recently integrated Maestro Orchestrate as a local specialist-agent catalog via MCP
The design philosophy is:
- small kernel
- pluggable modules
- optional apps/workbench
- each project decides strictness
- every important agent claim needs command evidence
What I’m unsure about:
Is “OS-like control plane for agents” the right abstraction, or is this just workflow tooling with a fancy name?
Should lifecycle gates be strict by default, or opt-in per project?
Is spec-first / checkpoint-first work too much friction for everyday coding?
How should subagent registries be represented without turning into prompt soup?
Are there existing systems that solve this more cleanly?
I’m not looking for stars as much as architecture feedback. If this is over-engineered, I’d love to hear where. If the abstraction is useful, I’d love suggestions on what should be kernel vs plugin/module.