u/wingers999

▲ 6 r/LocalLLM+1 crossposts

Questions about moving over to Linux from Windows for a Linux Newbie (I work in IT but always used Windows and only ever tinkered with Linux on Raspberry pi years ago)

Hi

Lots of previous discussions have suggested that instead of Windows 11 I try Linux to get better Local LLM speeds on my Corsair AI Workstation 300 with AMD Ryzen AI Max+ 395 and 128GB RAM

I have some questions if you don't mind so I can make sure I do all of this correctly, as some of my initial tests didn't go so well (see bottom of post):

1) Choice of Distro?

Ubuntu or Fedora

2) Shared VRAM settings in grub and BIOS

A lot of sites say about setting ttm.pages_limit and amdgpu.gttsize

Options seem to be:

a) editing grub and adding:

amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432

or

b) install AMD Tools and using amd-ttm to set the shared vram

sudo apt install pipx

pipx install amd-debug-tools

amd-ttm

amd-ttm --set 100

A lot of sites I found the articles are older, so what is the current best way to do this and what should I set both via these settings and in BIOS?

3) ROCM or Vulkan?

Do I use ROCM or Vulkan with Ollama / LM Studio / Lemonade etc?

And if so best way to install / configure e.g. if Ollama what envionment variables need setting

Previous tests and issues

I initially installed Ubuntu 26.04 but had issues with ROCM drivers and found lots of posts about 24.04 being better choice, so installed that

Running models in Ollama seemed to work with Vulkan after adding below:

Environment="OLLAMA_VULKAN=1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="ROCR_VISIBLE_DEVICES="

But without the Environment="ROCR_VISIBLE_DEVICES=" entry I got errors trying to use models with ROCM:

ollama run llama3.3:70b
Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory
error loading model: unable to allocate ROCm0 buffer
panic: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d

I then tried LM Studio and it worked fine with Vulkan set as Runtime, but with ROCM I just keep getting "Failed to load model" with no further error info

I also tried Lemonade-Server and that again works with Vulkan but not ROCM

Summary

So what was I doing wrong in initial tests and based on answers to my questions what is my best option to get the best model performance on this system.

Thanks for reading - sorry it is a long post, but wanted to give all detail possible

Anything else you need to know to help then just asks

reddit.com
u/wingers999 — 2 days ago
▲ 2 r/LocalLLM+1 crossposts

Dual boot Linux as well as Windows on my Corsair Workstation 300 AI PC

Lots of advice in my previous question about best tools / models for coding seems to indicate I would be better off running Linux on my AMD Ryzen™ AI Max+ 395 / 128GB RAM device.

I work with Windows and have always used it, so that is my area of expertise, but more than happy to give Linux a try, just want some advice on the best distro to use on this device where I can still then run ollama or lm studio or whatever is best to run in linux and get the most out of my hardware.

just ordered a separate NVME SSD for my device so I can install Linux on separate drive for ease.

so please recommend any linux distros, and software to use once installed.

Also any step by step guides that would help me initially configure it would be great

thanks

reddit.com
u/wingers999 — 5 days ago

Hi I am looking for some advice on the best local models I can use for coding via VS Code or similar, running on a Corsair AI Workstation 300 with AMD Ryzen AI Max+ 395 / 128Gb (96Gb shared for VRAM).

Please suggest best models I can use along with tips on best extensions for VS Code or similar to work with.

So far I have managed to get some models running okay on this device, notes below, but would appreciate comments / help from others using similar spec hardware.

qwen3.6:35b-a3b - works quickly but created code often has issues, regularly gets "stuck" and goes in circles trying to fix obvious errors

qwen3.6:27b - slow than above as expected, but produces better output, although still a lot of coding issues to often fix before projects will build / run

Have tested with VS Code (using Roo Code and Cline extensions), also tried OpenCode but kept getting errors

Found that VS Code extensions worked better when accessing models via Ollama rather than LM Studio, although in testing LM Studio does seem quicker on this system, but definitely not so reliable via Roo Code

In all cases have increased context size as much as VRAM allowed.

Any other suggestions on

a) Models I could try which would give better coding results than qwen3.6:27b ?

b) different tweaks / settings to try to improve performance and output

c) different apps / extensions / IDE's etc to try

reddit.com
u/wingers999 — 8 days ago