Hi everyone,
We are a small startup team of 4 developers, mainly working on SaaS products with microservices. Our projects are relatively small-to-medium in scope and we care a lot about maintainability, testing, security, and keeping the architecture simple.
We are thinking about setting up a multi-agent AI development workflow with 6 specialized agents:
- Orchestrator / Task Planner Breaks down specs into implementation tasks, defines acceptance criteria, keeps scope under control, and decides what should happen next.
- Builder Implements the task, writes/updates code, follows the acceptance criteria, and does not redefine the scope.
- Test Writer Generates unit/integration tests for the new code.
- Acceptance Tester Validates whether the implementation actually meets the acceptance criteria. Output would be something like Pass / Fail / Blocked.
- Code Reviewer / QA Agent Reviews the diff for correctness, maintainability, edge cases, and possible architectural issues.
- Security Agent Reviews the changes from a security perspective: OWASP-style checks, secrets, auth issues, unsafe data handling, logging of sensitive data, etc.
The rough idea is:
Orchestrator → Builder → Test agents → Security review → final acceptance
Right now, we are considering:
- Claude Max / Claude Opus or Sonnet for the Orchestrator, because planning and task decomposition seem to benefit from stronger reasoning.
- Codex for the Builder, because we like the coding workflow and implementation quality.
- Claude Max / Claude Sonnet for testing and review agents.
My questions:
- Does this agent split make sense, or are we over-engineering it for a small 4-dev startup?
- Would you merge some of these agents?
- Which models would you use for each role?
- Is Claude Max a good choice for the Orchestrator and Tester roles?
- Is Codex a good choice for the Builder role?
- Are there cheaper alternatives that are good enough for this kind of scope? I've heard Deepseek v4 or Qwen are good alternatives, but I need real feedback.
- For small SaaS/microservice projects, would you use premium models only for planning/review and cheaper models for implementation/testing?
- Any practical advice from people already using multi-agent workflows in real projects?
We are not trying to build a huge autonomous system. The goal is more pragmatic: consistent AI-assisted development across our team, better specs, better tests, fewer regressions, and a repeatable workflow that is easy to maintain.
Would love to hear what architecture and model choices you would recommend.
IMPORTANT: we are all using the same account of claude and codex, not a account per seat, which means we have 4x the workforce on a same model.
Gracias! :D