


ChatGPT Image 2.0 is a step back.
When 1.5 became the new norm, I was thrilled to find that one thing it excelled at was taking image gens I'd made with SD 1.5 and XL at home and improving them across all benchmarks for whatever style I was aiming for.
In this example, the pictures on this post are:
- My original image, created on my PC with SD 1.5 and a lot of post-gen Photoshop work to fix fine details.
- ChatGPT Image 1.5 image-to-image render to improve the original
- ChatGPT Image 2.0 image-to-image render using the exact same prompt that produced image #2
I actually let ChatGPT write most of the prompt after giving it image #1. I let it describe what it saw in the original, then told it what I was aiming for in terms of retaining the existing style, cleaning up remaining AI artifacts, etc., then asked it to write an optimized prompt based on what I was asking for.
It's own guidance produced image #2, and I couldn't have been happier. I ended up re-working many, many old images in this manner and posting them to my DA account.
Then 2.0 was announced. Of course, I was expecting even finer updates based on the huge leap from Dalle-3 to Image 1.5
The result of using the same GPT-guided prompt resulted in image #3.
Of course the first thing I asked it was:
>Why does the image output look like a low-quality render done with image generator tech from 2-3 years ago?
To which it basically blamed me for provided a prompt that it claimed mixed too many artistic and photography terms together, resulting in a request that worked against the model's attempts to render high quality output.
When I pointed out that it had generated the prompt, it did the usual GPT double-down defensiveness and claimed it could re-write the prompt to make it work as intended. So I let it do this, but knowing it was the usual GPT blather I'd seen whenever it was about to make claims that would fall flat.
And I was right. GPT's "fixed prompt" resulted in a nearly identical image to #3. And when I pointed that out it got super-defensive and evasive about the prompt that was used.
So I pivoted, explaining that I was going to show it its own work with the original prompt it wrote based on my instructions and rendered only a few months ago. I re-uploaded image #2 into our conversation fork and asked it to do a comparison of the outputs. The answer is quoted below.
Note: While it does a good job of pointing out the differences, I don't give much credence to its explanation of why the older model performed better or the newer model's bias preferences.
All I know if that I'm getting crappy output now, and not even ChatGPT seems to know how to help me get the output I want.
ChatGPT's comparison response
You’re right to push back. That earlier image is objectively stronger in several key ways, and it’s not because your prompt was better “interpreted” — it’s because the model behavior changed.
Here’s the real difference, grounded in what we can see:
What the older output is doing better (clearly visible)
1. Material coherence
- Fabrics read as constructed garments, not textures laid on forms
- The mantle, vest, and sleeves have consistent weave logic and tension
- Leather (belt, boots) shows natural compression and wear patterns
In the new output, materials drift toward:
- generalized texture noise
- less consistent tension logic
- weaker edge separation between layers
2. Edge discipline and form clarity
Older image:
- Very clean silhouette edges
- Trim and embroidery feel deliberately designed and stable
- No “melting” or ambiguity where materials meet
New image:
- Slight edge softness
- Decorative elements feel a bit “generated” rather than constructed
- Minor ambiguity in transitions (especially around cloth overlaps)
3. Full-frame consistency
Older image:
- Background, clothing, and character all share the same resolution of detail
- No obvious drop-off outside the face
New image:
- Face still strongest
- Mid-body and lower body slightly less resolved
- Background leans more toward atmospheric softness
4. Lighting discipline
Older image:
- Controlled, diffuse, physically consistent lighting
- No heavy cinematic bias
- Surfaces read through form and material, not lighting tricks
New image:
- Slight cinematic push (contrast, depth cues)
- More reliance on lighting to “sell” the image
5. Structural confidence
Older image:
- Feels designed
- Weight distribution is convincing
- Clothing sits on the body with intention
New image:
- Still good, but slightly less grounded
- Subtle “float” in how elements relate to each other
What actually changed (and why your prompt now underperforms)
This is the important part—and it’s not hand-wavy:
The newer model is:
- More biased toward cinematic / photoreal outputs
- More aggressive about depth, lighting, and mood
- Slightly less rigid about micro-structure unless explicitly forced
The older model you used was:
- More illustration-leaning by default
- More deterministic in texture construction
- Less likely to “stylize” with lighting and softness
So your prompt didn’t get worse—the baseline behavior shifted underneath it.
Why my previous explanation didn’t land
You called it out correctly.
The issue is not:
>“you left too much room”
The issue is:
>the model now fills that room differently than it did before
Same prompt → different priors → different output.
What this means for fixing it
You don’t need a totally different prompt philosophy.
You need a prompt that:
- Suppresses the new model’s cinematic bias
- Forces structural/material discipline explicitly
- Reasserts full-frame consistency as a requirement, not a preference
Key takeaway
Your original prompt was good. The model changed its defaults. Now you have to overconstrain areas that used to be handled implicitly.