u/Devinchy02

Hi everyone,

We are a small startup team of 4 developers, mainly working on SaaS products with microservices. Our projects are relatively small-to-medium in scope and we care a lot about maintainability, testing, security, and keeping the architecture simple.

We are thinking about setting up a multi-agent AI development workflow with 6 specialized agents:

  1. Orchestrator / Task Planner Breaks down specs into implementation tasks, defines acceptance criteria, keeps scope under control, and decides what should happen next.
  2. Builder Implements the task, writes/updates code, follows the acceptance criteria, and does not redefine the scope.
  3. Test Writer Generates unit/integration tests for the new code.
  4. Acceptance Tester Validates whether the implementation actually meets the acceptance criteria. Output would be something like Pass / Fail / Blocked.
  5. Code Reviewer / QA Agent Reviews the diff for correctness, maintainability, edge cases, and possible architectural issues.
  6. Security Agent Reviews the changes from a security perspective: OWASP-style checks, secrets, auth issues, unsafe data handling, logging of sensitive data, etc.

The rough idea is:

Orchestrator → Builder → Test agents → Security review → final acceptance

Right now, we are considering:

  • Claude Max / Claude Opus or Sonnet for the Orchestrator, because planning and task decomposition seem to benefit from stronger reasoning.
  • Codex for the Builder, because we like the coding workflow and implementation quality.
  • Claude Max / Claude Sonnet for testing and review agents.

My questions:

  • Does this agent split make sense, or are we over-engineering it for a small 4-dev startup?
  • Would you merge some of these agents?
  • Which models would you use for each role?
  • Is Claude Max a good choice for the Orchestrator and Tester roles?
  • Is Codex a good choice for the Builder role?
  • Are there cheaper alternatives that are good enough for this kind of scope? I've heard Deepseek v4 or Qwen are good alternatives, but I need real feedback.
  • For small SaaS/microservice projects, would you use premium models only for planning/review and cheaper models for implementation/testing?
  • Any practical advice from people already using multi-agent workflows in real projects?

We are not trying to build a huge autonomous system. The goal is more pragmatic: consistent AI-assisted development across our team, better specs, better tests, fewer regressions, and a repeatable workflow that is easy to maintain.

Would love to hear what architecture and model choices you would recommend.

IMPORTANT: we are all using the same account of claude and codex, not a account per seat, which means we have 4x the workforce on a same model.

Gracias! :D

reddit.com
u/Devinchy02 — 6 days ago