u/Meduty

▲ 1 r/statistics+1 crossposts

[D] The Star-Rating Dilemma: A simple mathematical model for when "more stars" collides with "fewer votes"

Picture a familiar choice when looking at reviews:

  • Option A: 4.0★, 100 ratings
  • Option B: 4.5★, 10 ratings

Option B has a higher average, but the lower number of ratings makes it less trustworthy. We all intuitively discount ratings with small sample sizes, but I wanted to make this intuition explicit and tunable without relying on complex Bayesian priors right out of the gate.

I wrote up a tiny article using three short functions to formalize this trade-off. Here is a small excerpt of the approach:

1. Normalise the rating First, map any rating scale (like 1 to 5 stars) linearly onto a [0,1] scale.

2. Calculate Confidence from the vote count We need a function c(n) that takes a vote count n and returns a confidence value in [0,1). It should have diminishing returns. While an exponential or arctangent function works, a simple rational function is the most conservative:

c(n) = n / (n + c_h)

Here, c_h is the "half-confidence point" — the number of reviews at which you consider the rating exactly 50% trustworthy. (For Google Maps, maybe c_h = 50).

3. Merge both via a risk-aversion parameter (ρ) Instead of just multiplying rating by confidence, we can weight them based on a risk-aversion parameter ρ:

V = (r + ρ * c(n)) / (1 + ρ)

  • ρ = 0: Pure star-gazing (risk-seeking). Vote count is ignored.
  • ρ → ∞: Maximum caution. Only sample size matters.

The Tipping Point: When you map this out, even mild risk aversion (weighting sample size at roughly a fifth of star quality) is enough to flip the lead to Option A. The 0.5-star advantage of Option B simply cannot overcome its confidence deficit.

AI chatbots tell me the classic statistical approach to this is a Bayesian average (shrinking low-sample averages toward a global mean), but I am not familiar with the concept and I liked the transparency of keeping the critical review amount, and risk-aversion as distinct, tuneable pieces.

I'd love to hear this community's critiques on this model. How would you improve it? Do you see any specific flaws?

If you want to see the full story, the visual graphs for the tipping points, the sensitivity analysis of c_h, and a workaround for handling vote counts spanning several orders of magnitude, you can read my full article here.

https://frequently-asking-questions.com/2026/04/18/the-star-rating-dilemma/

reddit.com
u/Meduty — 2 days ago