Questions about moving over to Linux from Windows for a Linux Newbie (I work in IT but always used Windows and only ever tinkered with Linux on Raspberry pi years ago)
Hi
Lots of previous discussions have suggested that instead of Windows 11 I try Linux to get better Local LLM speeds on my Corsair AI Workstation 300 with AMD Ryzen AI Max+ 395 and 128GB RAM
I have some questions if you don't mind so I can make sure I do all of this correctly, as some of my initial tests didn't go so well (see bottom of post):
1) Choice of Distro?
Ubuntu or Fedora
2) Shared VRAM settings in grub and BIOS
A lot of sites say about setting ttm.pages_limit and amdgpu.gttsize
Options seem to be:
a) editing grub and adding:
amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432
or
b) install AMD Tools and using amd-ttm to set the shared vram
sudo apt install pipx
pipx install amd-debug-tools
amd-ttm
amd-ttm --set 100
A lot of sites I found the articles are older, so what is the current best way to do this and what should I set both via these settings and in BIOS?
3) ROCM or Vulkan?
Do I use ROCM or Vulkan with Ollama / LM Studio / Lemonade etc?
And if so best way to install / configure e.g. if Ollama what envionment variables need setting
Previous tests and issues
I initially installed Ubuntu 26.04 but had issues with ROCM drivers and found lots of posts about 24.04 being better choice, so installed that
Running models in Ollama seemed to work with Vulkan after adding below:
Environment="OLLAMA_VULKAN=1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="ROCR_VISIBLE_DEVICES="
But without the Environment="ROCR_VISIBLE_DEVICES=" entry I got errors trying to use models with ROCM:
ollama run llama3.3:70b
Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory
error loading model: unable to allocate ROCm0 buffer
panic: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d
I then tried LM Studio and it worked fine with Vulkan set as Runtime, but with ROCM I just keep getting "Failed to load model" with no further error info
I also tried Lemonade-Server and that again works with Vulkan but not ROCM
Summary
So what was I doing wrong in initial tests and based on answers to my questions what is my best option to get the best model performance on this system.
Thanks for reading - sorry it is a long post, but wanted to give all detail possible
Anything else you need to know to help then just asks