
My AI Bot Generator Needed Its Own Scoring System
I originally built a visual flowchart editor for Discord bots.
Commands, workflows, events, variables, permissions, databases, interactions... everything already had a strict structure behind it.
At one point I caught myself thinking: "If humans can build inside this structure... what happens if AI can understand it too?"
That question ended up turning into Sefrum.
What I didn't expect was that generation itself would become its own engineering problem. The model wasn't the hard part, the hard part was getting consistent, usable output.
Keeping context broad enough for the AI to understand the full project, while not feeding it so much that it starts making bad assumptions, teaching it how commands relate to workflows, how state moves between systems, how variables stay consistent, how naming doesn't slowly drift halfway through generation.
At some point I even had to build a scoring system for the generation itself.
Not just "did it generate?"
But:
- Does the logic actually make sense?
- Are relationships valid?
- Are workflows connected correctly?
- Will this still scale after more features get added?
- Did the AI accidentally introduce weird edge cases?
Sometimes a small project is harder than a big one too.
One prompt like: "Create a playable Minesweeper Discord bot."
Only generated around 6 workflows.
But getting those 6 workflows right was harder than some full 30+ workflow moderation suites.
That's when I realized AI generation isn't really about prompts.
It's about architecture, and that architecture took longer than the entire platform itself.
Curious for other builders: What feature in your product looked straightforward... until you actually started building it?
If you want to try it for free: https://sefrum.com