u/Devinchy02 — reddlx

Hi everyone,

We are a small startup team of 4 developers, mainly working on SaaS products with microservices. Our projects are relatively small-to-medium in scope and we care a lot about maintainability, testing, security, and keeping the architecture simple.

We are thinking about setting up a multi-agent AI development workflow with 6 specialized agents:

Orchestrator / Task Planner Breaks down specs into implementation tasks, defines acceptance criteria, keeps scope under control, and decides what should happen next.
Builder Implements the task, writes/updates code, follows the acceptance criteria, and does not redefine the scope.
Test Writer Generates unit/integration tests for the new code.
Acceptance Tester Validates whether the implementation actually meets the acceptance criteria. Output would be something like Pass / Fail / Blocked.
Code Reviewer / QA Agent Reviews the diff for correctness, maintainability, edge cases, and possible architectural issues.
Security Agent Reviews the changes from a security perspective: OWASP-style checks, secrets, auth issues, unsafe data handling, logging of sensitive data, etc.

The rough idea is:

Orchestrator → Builder → Test agents → Security review → final acceptance

Right now, we are considering:

Claude Max / Claude Opus or Sonnet for the Orchestrator, because planning and task decomposition seem to benefit from stronger reasoning.
Codex for the Builder, because we like the coding workflow and implementation quality.
Claude Max / Claude Sonnet for testing and review agents.

My questions:

Does this agent split make sense, or are we over-engineering it for a small 4-dev startup?
Would you merge some of these agents?
Which models would you use for each role?
Is Claude Max a good choice for the Orchestrator and Tester roles?
Is Codex a good choice for the Builder role?
Are there cheaper alternatives that are good enough for this kind of scope? I've heard Deepseek v4 or Qwen are good alternatives, but I need real feedback.
For small SaaS/microservice projects, would you use premium models only for planning/review and cheaper models for implementation/testing?
Any practical advice from people already using multi-agent workflows in real projects?

We are not trying to build a huge autonomous system. The goal is more pragmatic: consistent AI-assisted development across our team, better specs, better tests, fewer regressions, and a repeatable workflow that is easy to maintain.

Would love to hear what architecture and model choices you would recommend.

IMPORTANT: we are all using the same account of claude and codex, not a account per seat, which means we have 4x the workforce on a same model.

Gracias! :D