u/WhichYoung6026

Lesson from rebuilding my scoring engine: domain correctness > general accuracy

I'm building MixDoctor — an AI mix/mastering analyzer for iOS/macOS (Swift, SwiftUI, Claude API). Just finished a significant overhaul of the core scoring system and wanted to share what I learned.

The original engine used flat thresholds for loudness, dynamic range, and frequency balance. It worked for mainstream genres but was actively wrong for edge cases — Metal, Classical, EDM all have "correct" values that look like problems by general standards.

Rebuilt it around 9 genre groups with research-backed thresholds. The engineering wasn't the hard part — the calibration and prompt engineering to get reliable, genre-appropriate feedback from the AI layer took most of the time.

Key takeaway: if you're building any domain-specific AI tool, your scoring/evaluation layer has to speak the domain's language. A generalist model with domain-specific prompting and thresholds outperforms a generalist approach end to end.

What are others building where this kind of domain-specific calibration has come up?

reddit.com
u/WhichYoung6026 — 1 day ago

MixDoctor just shipped genre-aware mix scoring — 9 genres, research-backed thresholds, AI feedback in plain English

MixDoctor is an AI-powered mix and mastering analyzer for iOS, iPadOS, and macOS. Instead of throwing technical graphs at you, it gives you specific, actionable feedback in plain language.

Latest update is a full scoring overhaul: 9 genre groups (Metal, EDM, Hip-Hop/Trap, Pop, Rock/Indie, Classical, A Cappella, Jazz, Live Performance), each with dedicated thresholds sourced from iZotope, Mastering The Mix, the Dynamic Range Database, and other mastering references.

The original system was genre-blind — a loud Metal master would get the same feedback as an over-compressed Pop track. Now it understands context.

Thinking about the next major feature — what would make an audio analysis tool genuinely indispensable to your workflow?

reddit.com
u/WhichYoung6026 — 1 day ago

I made an AI mix analyzer for iOS — just shipped genre-aware scoring so Metal doesn't get judged like Pop

MixDoctor — iOS/macOS app that analyzes your audio mixes and gives you plain-English feedback powered by AI.

Spent the last few weeks overhauling the scoring engine because I kept getting reports that professional-sounding mixes were scoring poorly. Turned out a Metal track at -7 LUFS looks "wrong" by Pop standards, but is completely correct for the genre.

Built out 9 genre groups (Metal, EDM, Hip-Hop/Trap, Pop, Rock/Indie, Classical, A Cappella, Jazz, Live Performance) with thresholds from actual mastering research. Now the feedback reflects genre reality instead of a generic average.

Solo project, launched in January. Still figuring out what to build next — open to suggestions from anyone who works with audio.

reddit.com
u/WhichYoung6026 — 1 day ago

Rebuilt a scoring engine to be domain-aware — lessons from getting it wrong the first time

Built MixDoctor, an AI-powered audio analysis app (iOS/macOS, Swift/SwiftUI, Claude API on the backend). The app analyzes mixes and masters and returns plain-English feedback.

First version of the scoring system used flat thresholds across all genres. Worked okay for Pop/Rock. Was actively wrong for Metal, EDM, Classical — genres where "correct" values look like outliers by general standards.

The fix: 9 genre groups, each with dedicated thresholds sourced from mastering research. The architecture change wasn't huge but the prompt engineering and threshold calibration took real iteration.

Lesson for anyone building domain-specific AI tools: your evaluation criteria need to be as domain-specific as your use case. Generic benchmarks will mislead you.

What are you all building where domain-specific scoring or evaluation has been a challenge?

reddit.com
u/WhichYoung6026 — 1 day ago

Title: How genre-aware scoring changed everything for mix feedback (and why "one size fits all" was hurting producers)

If you've ever run your Metal track through a loudness analyzer and gotten told your LUFS is "too hot" — you know the problem.

I build MixDoctor, an AI-powered mix analysis app for iOS/macOS. For months the scoring engine treated every genre the same. A Metal track sitting at -7 LUFS with DR5 would get penalized. A Jazz track with lots of dynamic range would score the same as a compressed Pop track. It was giving wrong feedback to the people who needed it most.

So I rebuilt the entire scoring system around 9 genre groups — Metal, EDM, Hip-Hop/Trap, Pop, Rock/Indie, Classical, A Cappella, Jazz, and Live Performance — each with thresholds pulled from research sources like iZotope, Mastering The Mix, and the Dynamic Range Database.

Now a Metal mix at -7 LUFS gets told it's sitting right where it should. A Jazz mix with wide dynamics gets credit for it.

Question for the community: What's the next thing you'd want an AI mix analysis tool to tackle? I'm scoping the next feature and genuinely curious what gaps producers feel aren't being served.

reddit.com
u/WhichYoung6026 — 1 day ago