The one prompting change that made multi model debates actually work
If youre anything like me, you ask Claude,GPT and then Gemini, and suddenly youre scrolling between three tabs trying to remember what you gave each model. Then you dump all three answers into a fourth chat to summarise and get back a weird answer that mostly rehashes one of them but you arent sure which.
The thing that fixed it for me wasnt just better role prompts but giving each model a different role such as skeptic, subject matter expert and an analyst. But separating the stance from the role as well. how it works is the skeptic gets failure modes, constraints, and what breaks. Subject matter expert gets upside, momentum, and what could compound and the analyst gets comparables, priors, and boring historical context. Same question, different briefs going in.
Then the synthesis prompt needs a fixed rubric. Not summarize and tell me what you think. I ask for the strongest argument from each side, the real disagreement, the current best answer, what condition would flip the call, and the next step. The what would flip the call part is the key, it stops the model hiding behind vague uncertainty. If the answer is conditional, it has to name the condition.
So the actual unlock was this. Don't just diversify the models, diversify the evidence each model sees. I've been using this enough that I ended up building a UI for it (www.serno.ai), but honestly prompting and patience gets you most of the way there. The important structure is stance, evidence frame, then forced synthesis.
Curious what other stance and evidence frame combinations people have found useful.