How are you actually running Codex at scale? Worktrees are theoretically perfect and practically painful. What's your setup?
Been running 4 to 6 Codex agents concurrently and I still haven't found a clean architecture. Wanted to ask how others are doing it.
The worktree trap
Worktrees sound ideal. Each agent gets isolation, you're not stomping on each other. But in practice:
Dependencies are missing unless you actively set them up. You have to maintain a mental map of what's merged to main and what isn't. You spot a bug running your main branch product but is that bug also present in the worktrees? Who knows. You spot a bug inside a worktree (for example testing a Telegram bot there) and now you can't branch off main, you have to branch from that worktree, which means that fix has to get merged back through an extra hop before it reaches main.
Scale this to 6 agents and the coordination overhead alone starts eating your throughput. I have a main branch and a consumer branch, so some PRs go to main, some to consumer and now it gets genuinely messy.
What I've tried
One orchestrator agent running in a tmux session, inside a worktree. It spawns sub agents into new tmux panes via the CLI, sometimes giving them their own worktrees, sometimes running them in the same one.
Promising in theory. Annoying in practice.
Where I'm converging
One integrator agent in a single worktree. All sub agents it spawns run inside that same worktree. One level of isolation. Ship PRs directly from there to main or consumer. No nested worktree graph to untangle.
Saw Peter Steinberger mention he doesn't use worktrees at all and I'm starting to understand why. With one worktree you get clarity. With six, you spend half your mental cycles just keeping the map in your head and the whole point of running agents is to offload cognitive load, not add it.
The session length problem
Something else I've been wondering about. When Codex finds a bug and fixes it, then immediately surfaces another issue, do you keep going in that same session or do you spin up a fresh one?
My experience is that the longer a session runs the worse the output gets. Context bloat makes the model noticeably slower and dumber. What should be a quick precise fix turns into the agent going in circles or making weird choices. At some point the session just becomes unusable.
So the question becomes: one long session per task, or short focused sessions per bug, even if that means more context setup overhead? And does your answer change depending on whether you're using worktrees or not?
What's your setup?
How are you running multi agent Codex in practice? Pure main branch, worktrees, tmux orchestration, something else entirely? Especially curious if anyone's found a clean solution for concurrent agents plus multiple target branches plus keeping sessions tight enough to stay useful.