u/ryan7ait

I'm working on a computer vision project where merchandisers take pictures of store shelves. My task is to detect the products in the image so I can identify competitors vs. my company's products.

I thought about two approaches:

Use YOLO to detect products on the shelves, annotate them, and train a model to classify which products belong to my company.
Create folders with images of each company's products, generate embeddings for them (possibly using OCR to extract and embed text), and when a new image arrives use vector search to identify which company the product belongs to.

Does this make sense, or is there a better approach for this problem?

(note that I don't have big resources to train a big model)

thanks in advance

Hello, I have a question.