Most agent systems trust the agent.
If the agent says “task complete”, the system accepts it.
I’ve been experimenting with the opposite idea:
what if the system treated the agent as untrusted?
Built a small kernel that does one thing:
→ the agent can propose an outcome
→ the kernel decides if it’s true
Example:
Process declares SUCCESS (without sufficient evidence)
Kernel:
REJECTED
reason: outcome does not match recorded events
Process adds more evidence (still insufficient)
Kernel:
REJECTED
Process provides required evidence
Kernel:
ACCEPTED
The key difference:
the outcome is derived from recorded state, not the agent’s claim.
The kernel maintains its own event log and evaluates outcomes independently.
Curious if others have explored systems where agent output is treated as untrusted input?