u/Previous-Ad9186

I analyzed every public LeRobot dataset on HuggingFace. Almost half would fail training.

I analyzed every public LeRobot dataset on HuggingFace. Almost half would fail training.

Got tired of burning GPU hours on data that looked fine but trained terribly. So I wrote something to check datasets before training — grades A through F based on dead joints, action divergence, episode consistency, etc.

Ran it across 45+ public datasets. Some findings:

  • 42% actually ready to train
  • 35% have critical issues (dead servos, contradictory demos)
  • Action divergence is the single biggest predictor of training failure
  • Several high-download datasets have problems nobody's flagged
  • 50 consistent demos reliably beats 200 sloppy ones

The thing that surprised me most: datasets with high action divergence (demonstrator doing different things in the same state) fail even with Diffusion Policy. You basically need to filter to one strategy or the policy just averages them into mush.

Anyone else checking their data quality before training, or just yolo-ing it?

reddit.com
u/Previous-Ad9186 — 8 hours ago