ARL CPU bug: ring downbin adjusting frequency based on factory VF curves, not actual real-time voltage?
I'm seeing a weird behavior with Ring frequency on 275hx CPU when the "ring down bin" (RDB from now) is enabled.
Under stock conditions
RDB adjusts ring frequency, so that the uncore voltage doesn't surpass maximum core voltage (ie. one core runs at 1.1V, so RDB limits the cache multiplier, so that Vuncore <= 1.1V).
When you UV CPU cores only
You apply 100mV core UV, so the previous 1.1V on core becomes 1.0V. However RDB still follows the old value of 1.1V, running at the same frequency as previously. In this scenario it's possible that ring hoists the total Vid up to 1.1V, greatly increasing power loss on DLVRs.
When you UV both by the same amount
You UV ring by 100 mV, both voltages drop, and thus everything is fine with VID = 1.0V.
In realistic scenario, cores can UV by 50mV and cache by 100-150 mV, meaning that ring will be running at lower voltage most of the time, but won't gain or lose any frequency.
You limit CPU frequency
This is the main problem I'm dealing with:
- P/E core frequency 4.8/4.4 GHz at 1.01V VID (50 mV UV applied)
- Ring UV by 150 mV, with maximum of 1.00V at 3.8 GHz
The ring frequency gets limited to 3.5 GHz at 880mV, even though all bins above that fit under 1V. RDB simply scales the frequency according to the factory VF curves, even though the voltages are completely different in reality.
Why not disable RDB
Because it will always try to boost cache to maximum of 3.8GHz, hoisting VID to 1.00V, whilst the cores run at 800-900 mV when power limited above 3.8GHz. This causes a 10-15% performance regression, unless the cores themselves also run maxed out at 1V without power limit.
So what's the point
RDB should be scaling frequency based on real-time voltage, so that effectivity is increased. The current behavior causes the ring to run at much lower frequency than it needs to, decreasing performance, and increasing DLVR power losses.
I'm ultimately hoping this reaches Intel reps, and that micro code update, that fixes this behavior, gets released.