u/tatertots89

OpenWebUI Desktop auto-updates keep breaking GPU inference — how do I stop it or automate the fix?

Every time the OpenWebUI desktop app auto-updates, it replaces the bundled llama.cpp binary with a CPU-only version. To get GPU acceleration working again I have to manually:

  1. Download **Windows x64 (CUDA 13)** and **CUDA 13.1 DLLs** from the llama.cpp releases page

  2. Extract and drag/drop the files into:

C:\Users\UserName\AppData\Roaming\open-webui\llama.cpp\b8999\`

This happens every single day because the app updates very frequently, and each update changes the build folder (e.g. b8996 → b8999), so I have to find the new folder and replace the files every time.

I'm on the OpenWebUI Desktop App on windows 11 with a 3090 GPU.

I've confirmed CUDA inference works correctly after the manual replacement — flash attention auto-enables and all 65 layers offload to the GPU.

What I want to know:

- Is there a setting inside OpenWebUI Desktop to disable auto-updates?

- If not, is there a way to make it automatically use the CUDA binary after each update without manual intervention?

Also bonus question: How do I get vision working, if I try to paste a picture I get the following error (using Qwen3.6-27B-Q4_K_M):

"image input is not supported - hint: if this is unexpected, you may need to provide the mmproj"

Any help is greatly appreciated!

reddit.com
u/tatertots89 — 22 hours ago