u/FBI_memegod

Hi I am new to computer vision and would like some help deciding how to move forward. Currently I use RF DETR by Roboflow for damage segmentation of videos of cars alongside still images as I want a shared backbone but separate heads for class detection.

The issue is that my performance is piss poor with around 550 ~ images trained and my model averaging mAP 20% and mAP 30%, this performance is likely due to labelling issue as detecting dents and shattered glass classes have 60% ~ 70% mAP but scratches and car fragmentation being 10% ~ 19% mAP. After further evaluating the model I found that whenever it predicts a class its usually right but most of the time it doesn't predict anything at all if I lower the confidence its right most of the time but not enough to be consistent.

To combat this I am trying to use bounding box's to train RF DETR for detection as it is natively designed around bounding box's and then using SAM 2 for instance segmentation by referencing the bounding box given. But using SAM 2 on Roboflow has lead me to be distrustful of its effects for labelling data and general application as it often labelled incorrectly given class names due to the nature of the classes irregularity.

My question is what direction should I pursue, should I keep training RF DETR for segmentation or try a RF DETR + SAM 2 approach. Currently these two seem like good options mainly due to their generous licensing.

And do you have any general advice or sources for how to label and improve models in this scenario.

Damage segmentation model choices