▲ 8 r/AIQuality
Who are the developlers here who care about AI quality?
Something I keep running into is shipping LLM features seems to be easy, but knowing whether they're actually good is not.
Curious how people are handling this. Do you....
- maintain a golden dataset and re-run it on every prompt change?
- use LLM-as-judge? If so, how do you trust the judge?
- ship and watch user feedback?
- something else?
I've been going back and forth on opening a focused group chat for developers who care about this stuff. Just a place that's open to comparing notes and experiences. What do any of you think?
Regardless, superi nterested in how folks here are approaching AI quality, etc.
u/ajdevrel — 3 days ago