Mi50 16GB or V100 16GB?
Hey everyone! I'm checking out GPU market for a local LLM. I'm interested in the mi50 16GB and the v100 16GB (the 32GB versions of both GPUs are unjustifiably expensive).
Here’s what I’ve noticed while researching the topic:
V100 - the "safe" option that just works. But there's a catch: it's SXM2, so you need to buy a PCIe adapter + cooling. Ideally, you could mount cooling from a 5090-4090 (or something simpler), and then you can probably forget about overheating.
The only downside is that everything will cost more, but it'll work fine if you set it up right.
mi50 - in terms of specs, it's better than the v100, but I see some serious (in my view) problems:
- Different BIOS versions that need to be installed depending on task. Like using the Radeon VII BIOS to make it work in consumer motherboards, but sellers usually sell them already flashed, so that shouldn't be an issue.
- "Insufficient multithreading" - https://www.reddit.com/r/LocalAIServers/comments/1koltfb/comment/mt1ihpe/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button - the commenter is likely talking about vLLM.
- Old ROCm - requires some tricks with .env (which isn't a problem), but if you need anything beyond LLM inference (for example, if you want to fine-tune a model), then big problems start to arise. With the v100, these issues are much less frequent (CUDA, after all).
On the plus side, the mi50 is cheaper than a bare v100 SXM2 (and the mi50 comes with a heatsink and PCIe by default).
Also, a downside for both is the lack of flash-attention-2 support, which means newer models might just not work (though it's unclear if they won't work in vLLM or llama.cpp).
So the question remains: knowing these nuances, which is the better choice? Keeping in mind that I'll likely buy several GPUs.