u/Jazzlike_History89

Both datasets: r = 0.70. Same correlation coefficient. But one looks noticeably more clustered around the regression line.

The difference is purely in the standard deviations - not the strength of the relationship. Because Pearson's r converts everything into standard units before measuring, it's blind to how physically spread out the data is. Smaller SDs → visually compact plot → same r.

It's a surprisingly easy trap. Your eyes read the raw coordinate space. r operates in standardized space. Those two views can look totally different.

I put this exact question to ChatGPT (with Thinking Mode) as a test - it fell for it too. Made a short video breaking down the full explanation here: https://youtu.be/GA7DQcc-ouo

u/Jazzlike_History89 — 14 days ago

Scatter plots can fool you. The way data clusters around a line doesn't always mean what you think it does - which is why the correlation coefficient exists in the first place. Here's a clean example of that trap, and what happened when I put it to ChatGPT.

The prompt: "Here are two scatter plots. Compare their correlation coefficients."

Wrong answer.

I switched on Thinking Mode and tried again.

Still wrong.

https://preview.redd.it/bx0e80pgbhyg1.png?width=1059&format=png&auto=webp&s=043c5caa8a847941d643e12933bde0ce018d1832

So I gave it a nudge: "Keep in mind that the appearance of a scatter diagram depends on the standard deviations. Check the numbers - not just how the plots look."

That did it.

What's actually going on:

Both plots have identical correlation r. But one looks noticeably more clustered around the regression line - and that's purely because its standard deviations are smaller. The data doesn't spread as far from the mean, so visually it appears tighter.

But r isn't fooled by that. The formula converts everything into standard units first - deviations from the mean are divided by the SD before anything is calculated. So r measures clustering relative to the spread, not in absolute terms.

Smaller SDs → visually compact plot → same r.

It's an easy trap. You see a tight cluster and assume stronger correlation. But r already accounts for how spread out your data is - a compact-looking plot can have the exact same correlation as a loose one.

I walked through the full exchange - both plots and the ChatGPT conversation in a short video here if you want to see it.

What I find interesting:

ChatGPT didn't flag any uncertainty in its wrong answers. Both were confident, well-structured, and incorrect. It only corrected course when nudged toward the math explicitly.

reddit.com
u/Jazzlike_History89 — 14 days ago