gemini's image understanding is so far ahead of everything else I've used and I don't see enough people talking about it
I keep seeing posts comparing gemini to chatgpt on text and coding and I feel like everyone is sleeping on the thing gemini actually does better than anything else which is understanding images and visual content.
I do video and design work and about a month ago I started using gemini for something specific, I'll take a screenshot of a video frame or a design comp or a visual reference I found online and ask gemini to analyze the composition and color palette and lighting and mood and tell me how to recreate or riff on that visual style.
and it's genuinely incredible at this, like I showed it a frame from a wes anderson film and asked it to break down exactly what makes it feel like a wes anderson shot and it identified the specific color relationships and the symmetry and the depth of field choices and the prop placement in a way that was actually useful to me as someone trying to achieve a similar feel, chatgpt gave me generic film school stuff when I tried the same thing and claude just described what was in the image without the compositional analysis.
where this has become really practical for me is in my actual production workflow, I'll generate visual concepts using midjourney, run style references through magic hour and runway to test different looks in motion, and then when I need to understand why a certain reference image or video frame works I bring it to gemini because it can articulate the visual principles in a way I can actually apply to my own work.
it's become this weird thing where gemini isn't the tool I use to create anything but it's the tool that makes me better at using every other tool because it helps me see what I'm looking at more precisely.
the other thing it does that I haven't been able to replicate anywhere else is comparing two images and telling me specifically what's different about them compositionally not just content wise, like I'll show it two versions of the same shot with different color grading and it'll tell me exactly how the warm tones in version A create intimacy while the cooler tones in version B create distance and why that's happening technically.
has anyone else found gemini's visual analysis to be way ahead of the other models or am I just not prompting chatgpt and claude correctly for this kind of thing