u/ajdevrel

Who are the developlers here who care about AI quality?

Something I keep running into is shipping LLM features seems to be easy, but knowing whether they're actually good is not.

Curious how people are handling this. Do you....

maintain a golden dataset and re-run it on every prompt change?
use LLM-as-judge? If so, how do you trust the judge?
ship and watch user feedback?
something else?

I've been going back and forth on opening a focused group chat for developers who care about this stuff. Just a place that's open to comparing notes and experiences. What do any of you think?

Regardless, superi nterested in how folks here are approaching AI quality, etc.

reddit.com

u/ajdevrel — 3 days ago

▲ 2 r/learnmachinelearning+1 crossposts

[ Removed by moderator ]

[supprimé]

reddit.com

u/ajdevrel — 6 days ago