u/Empty_Satisfaction_4

The one prompting change that made multi model debates actually work

If youre anything like me, you ask Claude,GPT and then Gemini, and suddenly youre scrolling between three tabs trying to remember what you gave each model. Then you dump all three answers into a fourth chat to summarise and get back a weird answer that mostly rehashes one of them but you arent sure which.

The thing that fixed it for me wasnt just better role prompts but giving each model a different role such as skeptic, subject matter expert and an analyst. But separating the stance from the role as well. how it works is the skeptic gets failure modes, constraints, and what breaks. Subject matter expert gets upside, momentum, and what could compound and the analyst gets comparables, priors, and boring historical context. Same question, different briefs going in.

Then the synthesis prompt needs a fixed rubric. Not summarize and tell me what you think. I ask for the strongest argument from each side, the real disagreement, the current best answer, what condition would flip the call, and the next step. The what would flip the call part is the key, it stops the model hiding behind vague uncertainty. If the answer is conditional, it has to name the condition.

So the actual unlock was this. Don't just diversify the models, diversify the evidence each model sees. I've been using this enough that I ended up building a UI for it (www.serno.ai), but honestly prompting and patience gets you most of the way there. The important structure is stance, evidence frame, then forced synthesis.

Curious what other stance and evidence frame combinations people have found useful.

reddit.com
u/Empty_Satisfaction_4 — 4 hours ago

Has anyone built a consumer AI agent that isn't just a chatbot wrapper?

Genuine question. Most consumer-facing things called "AI agents" right now are chat UIs with system prompts. The actual agent stuff (multi-model coordination, structural adversariness, forced outputs, real planning) has mostly stayed on the dev and enterprise side.

We tried building a consumer version. Serno is an AI agent for hard decisions and contested claims. You bring a question. Two pposing investigators run in parallel on different AI models. One builds the strongest yes case. The other builds the strongest no case. The system then forces a verdict with a confidence color (green, yellow, red) and names the worst case if it's wrong.

What I want to find out: is there a meaningful consumer agent category here, or is consumer AI permanently going to be chatbots?

reddit.com
u/Empty_Satisfaction_4 — 2 days ago
▲ 35 r/ChatGPT

Spent the last few months talking to people running heavy questions through chat( investments, career moves, technical decisions). And I kept hearing the same thing, chat threads get lost in the weeds, multiple tabs of model comparison are super tedious, and deep research is too much to read and unreliable.

So we built a canvas mode. You ask one question. Three agents (normally chat 5.5, opus 4.7 and gemini 3.1) instances investigate different angles in parallel different framings, different evidence bases. Then they actually debate each other. You watch the disagreement and steer.

Test question we ran: "Will the AI bubble pop in 2026?" and you can see the results for yourself. They disagreed in interesting ways and I feel its covered depth you dont normally get from one llm. you can check it out here

Would love if you all could to try and drop your hardest question in the comments and I'll run the canvas + share the verdict back. Or try it yourself (free, credits on us and you can reach out for more)

u/Empty_Satisfaction_4 — 10 days ago