ChatGPT Image 2.0 is a step back.

When 1.5 became the new norm, I was thrilled to find that one thing it excelled at was taking image gens I'd made with SD 1.5 and XL at home and improving them across all benchmarks for whatever style I was aiming for.

In this example, the pictures on this post are:

My original image, created on my PC with SD 1.5 and a lot of post-gen Photoshop work to fix fine details.
ChatGPT Image 1.5 image-to-image render to improve the original
ChatGPT Image 2.0 image-to-image render using the exact same prompt that produced image #2

I actually let ChatGPT write most of the prompt after giving it image #1. I let it describe what it saw in the original, then told it what I was aiming for in terms of retaining the existing style, cleaning up remaining AI artifacts, etc., then asked it to write an optimized prompt based on what I was asking for.

It's own guidance produced image #2, and I couldn't have been happier. I ended up re-working many, many old images in this manner and posting them to my DA account.

Then 2.0 was announced. Of course, I was expecting even finer updates based on the huge leap from Dalle-3 to Image 1.5

The result of using the same GPT-guided prompt resulted in image #3.

Of course the first thing I asked it was:

>Why does the image output look like a low-quality render done with image generator tech from 2-3 years ago?

To which it basically blamed me for provided a prompt that it claimed mixed too many artistic and photography terms together, resulting in a request that worked against the model's attempts to render high quality output.

When I pointed out that it had generated the prompt, it did the usual GPT double-down defensiveness and claimed it could re-write the prompt to make it work as intended. So I let it do this, but knowing it was the usual GPT blather I'd seen whenever it was about to make claims that would fall flat.

And I was right. GPT's "fixed prompt" resulted in a nearly identical image to #3. And when I pointed that out it got super-defensive and evasive about the prompt that was used.

So I pivoted, explaining that I was going to show it its own work with the original prompt it wrote based on my instructions and rendered only a few months ago. I re-uploaded image #2 into our conversation fork and asked it to do a comparison of the outputs. The answer is quoted below.

Note: While it does a good job of pointing out the differences, I don't give much credence to its explanation of why the older model performed better or the newer model's bias preferences.

All I know if that I'm getting crappy output now, and not even ChatGPT seems to know how to help me get the output I want.

ChatGPT's comparison response

You’re right to push back. That earlier image is objectively stronger in several key ways, and it’s not because your prompt was better “interpreted” — it’s because the model behavior changed.

Here’s the real difference, grounded in what we can see:

What the older output is doing better (clearly visible)

1. Material coherence

Fabrics read as constructed garments, not textures laid on forms
The mantle, vest, and sleeves have consistent weave logic and tension
Leather (belt, boots) shows natural compression and wear patterns

In the new output, materials drift toward:

generalized texture noise
less consistent tension logic
weaker edge separation between layers

2. Edge discipline and form clarity

Older image:

Very clean silhouette edges
Trim and embroidery feel deliberately designed and stable
No “melting” or ambiguity where materials meet

New image:

Slight edge softness
Decorative elements feel a bit “generated” rather than constructed
Minor ambiguity in transitions (especially around cloth overlaps)

3. Full-frame consistency

Older image:

Background, clothing, and character all share the same resolution of detail
No obvious drop-off outside the face

New image:

Face still strongest
Mid-body and lower body slightly less resolved
Background leans more toward atmospheric softness

4. Lighting discipline

Older image:

Controlled, diffuse, physically consistent lighting
No heavy cinematic bias
Surfaces read through form and material, not lighting tricks

New image:

Slight cinematic push (contrast, depth cues)
More reliance on lighting to “sell” the image

5. Structural confidence

Older image:

Feels designed
Weight distribution is convincing
Clothing sits on the body with intention

New image:

Still good, but slightly less grounded
Subtle “float” in how elements relate to each other

What actually changed (and why your prompt now underperforms)

This is the important part—and it’s not hand-wavy:

The newer model is:

More biased toward cinematic / photoreal outputs
More aggressive about depth, lighting, and mood
Slightly less rigid about micro-structure unless explicitly forced

The older model you used was:

More illustration-leaning by default
More deterministic in texture construction
Less likely to “stylize” with lighting and softness

So your prompt didn’t get worse—the baseline behavior shifted underneath it.

Why my previous explanation didn’t land

You called it out correctly.

The issue is not:

>“you left too much room”

The issue is:

>the model now fills that room differently than it did before

Same prompt → different priors → different output.

What this means for fixing it

You don’t need a totally different prompt philosophy.

You need a prompt that:

Suppresses the new model’s cinematic bias
Forces structural/material discipline explicitly
Reasserts full-frame consistency as a requirement, not a preference

Key takeaway

Your original prompt was good. The model changed its defaults. Now you have to overconstrain areas that used to be handled implicitly.

u/BigBlueWolf

ChatGPT's comparison response

What the older output is doing better (clearly visible)

1. Material coherence

2. Edge discipline and form clarity

3. Full-frame consistency

4. Lighting discipline

5. Structural confidence

What actually changed (and why your prompt now underperforms)

The newer model is:

Why my previous explanation didn’t land

What this means for fixing it

Key takeaway