u/IamBatman91939

NVIDIA Thermal Margins

NVIDIA Thermal Margins

I wanted to ask about the semantics behind these values
My NVIDIA RTX 6000 Ada Generation
has temperature values

Temperature

GPU Current Temp : 40 C

GPU T.Limit Temp : 51 C

GPU Shutdown T.Limit Temp : -7 C

GPU Slowdown T.Limit Temp : -2 C

GPU Max Operating T.Limit Temp : 0 C

GPU Target Temperature : 85 C

Memory Current Temp : N/A

Memory Max Operating T.Limit Temp : N/A

The issue however is I am unable to interpret a slowdown absolute temperature to monitor (using some monitoring tools) how long and how frequently it touches the slowdown temperature, the thermal limits are some margin where the unit is not just a temperature reading , hence I am unable to come to an absolute value of slowdown.

Even at support i dont see any reasonable answers.

[1] ; https://forums.developer.nvidia.com/t/nvidia-smi-gpu-t-limit-gpu-shutdown-t-limit-temp/292006

[2] : https://forums.developer.nvidia.com/t/nvidia-smi-gpu-target-temperature-maximum-operating-temperature/229325/2

[3]:https://docs.nvidia.com/deploy/nvidia-smi/index.html

[4] : https://www.bluechip.de/media/86/92/28/1726579902/Datenblatt_98216.pdf?ts=1726579902

I came across two threads, In [2] a moderator explains the Max and Targets values functions, which I understand but it used absolute values for the GPUs, in a relatively newer thread [1], someone asks same thing but the answer is unsatisfactory. Even [3] the docs are also abstract for me.

Although I tried finding Datasheet on Nvidia page couldnt find, found though on [4] and it states absolute values.

Now what if I am unable to find the data sheet for temperatures, I want to reverse engineer to understand what the formula could be, or is it not possible due to some hidden XYZ values used, regardless I just want to be able to interpret these 3 values, Max Operating, Target, SLowdown. Please guide.

u/IamBatman91939 — 4 days ago