OpenWebUI Desktop auto-updates keep breaking GPU inference — how do I stop it or automate the fix?
Every time the OpenWebUI desktop app auto-updates, it replaces the bundled llama.cpp binary with a CPU-only version. To get GPU acceleration working again I have to manually:
Download **Windows x64 (CUDA 13)** and **CUDA 13.1 DLLs** from the llama.cpp releases page
Extract and drag/drop the files into:
C:\Users\UserName\AppData\Roaming\open-webui\llama.cpp\b8999\`
This happens every single day because the app updates very frequently, and each update changes the build folder (e.g. b8996 → b8999), so I have to find the new folder and replace the files every time.
I'm on the OpenWebUI Desktop App on windows 11 with a 3090 GPU.
I've confirmed CUDA inference works correctly after the manual replacement — flash attention auto-enables and all 65 layers offload to the GPU.
What I want to know:
- Is there a setting inside OpenWebUI Desktop to disable auto-updates?
- If not, is there a way to make it automatically use the CUDA binary after each update without manual intervention?
Also bonus question: How do I get vision working, if I try to paste a picture I get the following error (using Qwen3.6-27B-Q4_K_M):
"image input is not supported - hint: if this is unexpected, you may need to provide the mmproj"
Any help is greatly appreciated!