u/CaleHenituse1

Hello everyone, I've recently created an automated setup for myself where I'm using deepeval to run offline evals, since I get all the scores, reasons and traces offline it's been really helpful and easy to use agents to iterate over them, thank you deepeval team for this!

Anyways, back to my question, even though this automated setup of mine works pretty well, after taking a deeper look all I see is overfitting, spitting at my face. All the changes made to my prompts and agent configs completely overfitted to my dataset, I was wondering how others are doing these self iterations loops and avoiding overfitting. Do you guys use random inputs from huge dataset for each iterations or do you have validation sets?

Any help is very much appreciated, thank you!

How do you guys avoid overfitting with vibe coding?

[ Removed by Reddit ]