u/Limp_Doubt6411

Image 1 — AMD RX 7900 XTX + ROCm + Gemma 4 26B — here's what actually worked for me
Image 2 — AMD RX 7900 XTX + ROCm + Gemma 4 26B — here's what actually worked for me
Image 3 — AMD RX 7900 XTX + ROCm + Gemma 4 26B — here's what actually worked for me
▲ 26 r/ROCm

AMD RX 7900 XTX + ROCm + Gemma 4 26B — here's what actually worked for me

Recent AMD/ROCm updates finally made local AI inference stable and I couldn't be happier.

Back in early 2025, I was running Mistral 7B CUDA with a custom HIP converter I built myself just to get it working on AMD. Now it runs natively without any of that. What a difference.

The system choice was intentional — RX 7900 XTX + Ryzen 9, partly for the price, but mainly because AMD's FP throughput and memory characteristics worked better for my specific workload. Some parts of my experimental pipeline were unstable on NVIDIA for reasons I still need to investigate.

Context length is still the limiting factor on a single local machine. My plan is to keep the core logic local and connect to a server for heavier lifting. The biggest win is keeping my AI in a safe place — protected from model updates and external changes.

One thing I'd like to see: better quantization support in vLLM. I understand it's server-oriented by design, but native quantization support for consumer GPUs would go a long way.

Setup

  • GPU: AMD Radeon RX 7900 XTX (24GB / gfx1100)
  • CPU: AMD Ryzen 9 9950X3D
  • OS: Ubuntu 24.04.2 LTS
  • ROCm: 7.2.3
  • Stack: llama.cpp (GGML_HIP=ON) + vLLM (ROCm)

Benchmark Results

  • Gemma 4 26B A4B — llama.cpp (HIP) Q4_K_M — PP: ~3355 t/s / TG: ~102 t/s
  • Qwen2.5-7B — vLLM (ROCm) FP16 — PP: ~3410 t/s / TG: ~56 t/s
  • Gemma 2 9B — llama.cpp (HIP) Q4_K_M — PP: ~2773 t/s / TG: ~79 t/s

PP = Prompt Processing (prefill), TG = Token Generation (decode)

The critical flag for llama.cpp

Building without -DGGML_HIP=ON compiles fine but silently falls back to CPU. No warning.

cmake -B build \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS="gfx1100" \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_C_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_PREFIX_PATH=/opt/rocm-7.2.3

cmake --build build --config Release -j$(nproc)

Docker setup

docker run -it \
  --device=/dev/kfd \
  --device=/dev/dri/card0 \
  --device=/dev/dri/renderD128 \
  --group-add video \
  -v /your/model/path:/workspace \
  rocm/pytorch:latest bash

Use code with caution.

Running

bash

HIP_VISIBLE_DEVICES=0 ./build/bin/llama-server \
  -m /workspace/your-model.gguf \
  -ngl 99 \
  --host 0.0.0.0 \
  --port 8000
  • HIP_VISIBLE_DEVICES=0 — stops ROCm from picking up the CPU iGPU as a second device
  • -ngl 99 — loads all layers to GPU. Without this, it runs on CPU regardless of build

Lazy startup script

Got tired of typing the same commands every time:

#!/bin/bash
docker start gemma2-vllm
docker exec -it gemma2-vllm bash -c "
cd /workspace/llama.cpp && \
HIP_VISIBLE_DEVICES=0 ./build/bin/llama-server \
  -m /workspace/your-model.gguf \
  -ngl 99 \
  --host 0.0.0.0 \
  --port 8000
"

Save as start_model.sh, chmod +x, done.

Model

Quantized Gemma 4 26B A4B on this setup — original 48GB → 16GB Q4_K_M.

https://huggingface.co/rakisis-core/Gemma-4-26B-A4B-Q4K_M-GGUF

---

**Full setup, scripts & guides:**

https://github.com/xinkanglabs/rocm-local-ai-stack

---

— XinXin-Kang / Xinkang Labs 🌐 xinkanglabs.com.au

u/Limp_Doubt6411 — 1 day ago
▲ 10 r/Advice

The birds still come every morning, but I don’t know if I should feed them anymore

For months, a few birds started visiting my house every day.
At first I just left out a little food, but eventually it became part of my routine. They’d wait nearby in the morning, and I honestly got attached to them.

Recently though, some neighbors complained and reported it.

Now every time I see the birds waiting outside, I feel guilty ignoring them. But I also understand why people might not want wild birds gathering around houses.

I know this probably sounds silly, but it genuinely makes me sad.
Would you stop feeding them?

reddit.com
u/Limp_Doubt6411 — 3 days ago

What are the basic etiquette rules for posting on Reddit?

Hi everyone,This is my first time actually posting on Reddit.

Are there any rules or manners I should know while participating here?

Any beginner tips would be greatly appreciated!

reddit.com
u/Limp_Doubt6411 — 3 days ago