
RX 7900 XTX vs Radeon AI PRO R9700 — llama.cpp Vulkan vs ROCm (6 models, token-gen)
Setup: llama.cpp llama-bench, -fa 1 -ngl 99 -ctk q8_0 -ctv q8_0 -p 512,2048 -n 128,256 -r
3, 300 W power cap on both cards. Models are unsloth GGUFs (UD-IQ4_XS / UD-Q4_K_XL);
gpt-oss-20b is the ggml-org native MXFP4. R9700 = RDNA4/gfx1201, 7900 XTX = RDNA3/gfx1100.
R9700 runs measured one day earlier, identical config.
Takeaways:
- 7900 XTX beats the R9700 by +24–29% on token-gen across the whole slate — memory
bandwidth (384-bit vs 256-bit).
- Vulkan > ROCm for token-gen on both architectures — huge on MoE (XTX: +33–64%).
- Prefill flips it: ROCm pp2048 is ~8–17% faster on dense models (e.g. Qwen-27B IQ4: ROCm
1022 vs Vulkan 870 t/s).
greetings Ginmarr
u/Ginmarr — 19 hours ago