u/Ok_Bet7598 — reddlx

TL;DR: I built TDD-Guard a year ago. I’m now working on Conduct, a more general policy engine for coding agents (Claude Code, Codex, GitHub Copilot CLI, and VS Code Chat). It includes a TDD rule that works with any language and test runner out of the box, supports parallel sessions, and handles refactoring properly.

Hi all,

The demo shows me prompting Claude Code to build a shopping cart in an empty project with Conduct’s TDD rule installed. I make no mention of TDD because I want to show how it is enforced out of the box. Hooks intercept each agent action, and a separate agent reviews the recent session, the pending action, and the current file before allowing it through. That extra context also helps it handle refactoring cleanly.

Repository: https://github.com/nizos/conduct

The project is in an early state. Feedback is welcome!

Background

I started using Claude Code about a year ago and was immediately convinced that I could make it follow Test-Driven Development (TDD) as it was a requirement if I were to ever use it for production. I tried different prompts and just like everyone else experienced how unreliable that was. The agents would drift as the context rotted, take shortcuts, and I had to keep supervising their practices.

Luckily, Claude introduced hooks around that time. You can think of them as events that fire automatically when an agent wants to perform an action like writing a file or running a command. The information in them lets you determine if the agent is, for example, trying to write multiple tests at once, and block the action with feedback on how to course correct. So I decided to use this to enforce TDD. I created a custom test reporter to capture test run output, combined it with the hook data, and provided it to a separate agent that judged whether the pending action violated TDD.

It worked really well. I called the project TDD-Guard. The community contributed support for several languages, and I’ve kept working on it since.

TDD Guard has its quirks though. It needs a dedicated reporter per test runner, which makes new language support slow. It can’t handle parallel sessions because reporter output gets overwritten. The validator also only sees the latest test output and the pending change, which isn’t always enough context to tell refactoring apart from new behavior. The validation ends up either too strict or too permissive.

Over time I noticed gaps in my workflow outside of TDD that I still had to supervise, and friction from teams using different agents in the same project with overlapping instructions and plugins. So I started a new project, Conduct, that takes a more general approach.

Conduct makes it easy to define rules that get enforced through hooks across all supported agents: Claude Code, Codex, GitHub Copilot CLI, and VS Code Chat, with more to come. It ships with deterministic rules for forbidding commands or content using string or regex matching, and it includes a TDD rule that addresses the limitations above.

The TDD rule reads recent session history instead of relying on a sidecar reporter, so it works with any language or test runner out of the box, parallel sessions don’t collide, and the validator has enough context to handle refactoring properly. It uses AI to validate, and reuses your existing subscription via the official SDKs. The validation instructions can be customized and you can scope which files TDD applies to.

I’ve been using Conduct over the past week in production with Claude Code and I’m genuinely impressed by how well it works. It catches real oversights without the friction TDD-Guard sometimes caused.