The hardest part of AI UGC nobody talks about: product consistency across scenes. Thoughts? AMA.
Been working on a full AI UGC for a French handbag brand and wanted to share what actually went wrong. Not just the final result.
The concept: a woman showing off the bag in different settings (haul style). Talking head scenes, lifestyle shots, a quick jump cut sequence on her outfit. Classic UGC format.
The final result looks clean. But getting there was painful.
Here's the real problem with AI: the product itself changes between scenes.
When you're doing talking head shots, you need volume. Multiple scenes, different angles, different moments. And every time you generate a new scene, the model can reinterprets the product. Even with reference images.
I was using Nano Banana Pro for the image generation and there were moments where I genuinely wanted to throw my laptop. You feed it the exact reference image of the bag — the shape, the strap, the hardware — and it just... decides to make it slightly different. Wrong proportions. Different buckle. Strap too thin. Sometimes it nails it, sometimes it's way off.
This is probably the biggest unsolved challenge in AI UGC right now. The character consistency problem is mostly solved. But product consistency? Still a nightmare.
What worked for me:
- Generate more scenes than you need, then cherry-pick the ones where the bag looks right
- The animation step was actually the easy part. Kling 3 handles that beautifully once you have a clean base image
- Did everything inside Hoox except one section (the outfit jump cut sequence), which I edited externally and re-imported
Total time: about 2 hours including all the failed generations and editing.
Genuinely curious. Has anyone found reliable techniques to maintain product accuracy across multiple AI-generated scenes? Reference images only get you so far.
Would love to hear what's working for others.