r/OpenWebUI

▲ 13 r/OpenWebUI+1 crossposts

Getting a lot of garbage results with Qwen3.6-27B :(

I'm running Qwen3.6-27B with vLLM at FP16. There are a few known issues with the chat template (I think), and I do get occasional stop in OpenCode or other harnesses.

But in OpenWebUI is 100x worse. The model stops, sometimes gives me garbage words in a loop and other times fails tool calls due to bad json. It's a 50% chance to actually manage to use it or not.

I don't get it, I'm using the default values and yes, Native tool calls. In vLLM I'm using the recommended params.

What else can I try?

reddit.com
u/nunodonato — 23 hours ago

Nvidia GPU cuda missing in Openwebui Desktop version of Linux

Hey guys,

Currently trying to setup the native version of Openwebui with the desktop version. I installed many different file versions and app build versions (appimage and flatpak) and I do see references of "Nvidia" when its installing, but when I go to the menu of Llama.cpp inside the app the Nvidia / Cuda is not an option. The only options are CPU, Rocm and the other two. I enabled the sofware with full permissions but it still doesnt show as an option. I tried on Windows and both Cuda 12 and 13 were an option. I am using Fedora 43. Issue seems isolated to linux and I havent seen anyone else mention it.

Are you guys having this issue? Is there a way to modify the llama install to enable/recompile it instead of having to do a separate install of llama and adding it manually in the server area. I currently us LM Studio and want to scrap it to use openwebui directly without additional software/overhead.

Specs

Fedora 43

RTX 5090 32gb - Nvidia Drivers NVIDIA-SMI 595.71.05 Driver Version: 595.71.05 CUDA Version: 13.2 (direct repo not rpm fusion)

reddit.com
u/iwannaredditonline — 18 hours ago

OpenWebUI Desktop auto-updates keep breaking GPU inference — how do I stop it or automate the fix?

Every time the OpenWebUI desktop app auto-updates, it replaces the bundled llama.cpp binary with a CPU-only version. To get GPU acceleration working again I have to manually:

  1. Download **Windows x64 (CUDA 13)** and **CUDA 13.1 DLLs** from the llama.cpp releases page

  2. Extract and drag/drop the files into:

C:\Users\UserName\AppData\Roaming\open-webui\llama.cpp\b8999\`

This happens every single day because the app updates very frequently, and each update changes the build folder (e.g. b8996 → b8999), so I have to find the new folder and replace the files every time.

I'm on the OpenWebUI Desktop App on windows 11 with a 3090 GPU.

I've confirmed CUDA inference works correctly after the manual replacement — flash attention auto-enables and all 65 layers offload to the GPU.

What I want to know:

- Is there a setting inside OpenWebUI Desktop to disable auto-updates?

- If not, is there a way to make it automatically use the CUDA binary after each update without manual intervention?

Also bonus question: How do I get vision working, if I try to paste a picture I get the following error (using Qwen3.6-27B-Q4_K_M):

"image input is not supported - hint: if this is unexpected, you may need to provide the mmproj"

Any help is greatly appreciated!

reddit.com
u/tatertots89 — 22 hours ago
🔥 Hot ▲ 55 r/OpenWebUI

Open Relay v3.4 — Native Live Inline Visualizations are here! 🎉

Hey everyone! Version 3.4 of Open Relay is live bring the latest live visualizations to the app natively!

🖼️ Native Inline Live Visualization

Charts, graphs, SVGs, and interactive visualizations now render LIVE directly inside chat messages as the AI streams them. No need for the Inline Visualizer plugin (though it still works with it too!). Just add "output the code in one file" at the end of your prompt and watch it get built in real-time.

Other improvements:

  • Grouped tool calls — consecutive tool calls from different tools now collapse into a single row, matching the web UI
  • Better citations — badges now show domain names by default instead of full page titles, and grouped citations like [1, 2, 3] render correctly. Toggle between domain/title in Settings → Chat Behavior
  • Math rendering fix — formulas inside code blocks now correctly restore their original delimiters
  • Smoother tables — links in table cells handle taps more reliably, cells reuse more efficiently

Bug fixes:

  • JavaScript now executes properly in HTML previews (Kanban boards, games, dashboards all work including drag-and-drop & localStorage)
  • Account switching instantly clears and reloads everything for the new account
  • Action buttons that depend on JS now work correctly
  • Tool call Rich UI embeds (music players, dashboards, etc.) should now be fully interactive
  • AI message content no longer clips at the bottom
  • Quick action pills no longer disappear after starting a new chat
  • Built-in tools (web search, image gen, code interpreter) no longer reset to model defaults after sending a message

📱 App Store | 💻 GitHub

Thanks to u/ClassicMain for such a great idea! As always, feedback welcome! Let me know if you run into any issues here or through Github (preferred).

u/Zealousideal_Fox6426 — 2 days ago

Sharing expierence and advice request

I have been using OpenWebUI for a while with local models but seems like I am missing something. There are so many features but nome of them feels native for me. Small thing but statistics during and after generation would be useful for me.

Update model params? Not comfortable at all..

Setup:

- Models on LM Studio pointed to OWUI. Tried with llama.cpp as well.

- Mac m2 24gb unified memory.

- Since its work laptop so I run smaller models 4-20b depends on the task.

Skills? Installed several - not working.

System prompts? Not using since my LLMs have different use cases so need to swith fast. Here not really possible.

What

- works: scheduled tasks

- websearch

I know community like this tool. I will try testing for a week more to give it a second chance.

What are your use case’es? Help me like it as a daily driver.

reddit.com
u/Right-Ice-6850 — 2 days ago

Trying a different approach to “memory” in OpenWebUI — looking for sanity checks

PRE: I wasn't going to share here, as I don't believe the end-user critique would be relevant(yet), and was hoping to get a more technical lens on this. It's functional and working in my own stack and I've made significant gains in useability and reliability.

the post was removed where I originally wanted to put it :( sorry, you're second.

I’ve been hacking on a side project to fix something that’s always bugged me with local assistants:

They either forget obvious things (like where I live), or they “remember” by pulling vaguely similar text and hoping it’s right.

I wanted memory to behave a bit more like a human’s:

  • conversation context is short‑term
  • only some things become long‑term memory
  • long‑term memory should be facts, not chunks of chat
  • new facts shouldn’t silently overwrite old ones

So I built a small memory layer that sits in front of OpenWebUI.

At a high level:

  • conversations stay short‑term
  • anything that looks like a fact gets extracted into a simple structured form
  • that fact is checked (basic rules + conflicts)
  • if it passes, it’s stored long‑term in Postgres
  • vectors are only used as a “this might be relevant” hint, never as the source of truth

Postgres is the authority, Qdrant can be rebuilt any time, and memory is strictly per‑user.

Concrete payoff example: Without this, you get:

>

With it, location was already validated and stored earlier, so the model can just answer.

I’m not really looking for users yet — more interested in architectural pushback:

  • Is drawing a hard line between “facts” and vector recall reasonable?
  • Does using a relational store as the memory authority make sense here?
  • Where would this break in practice?
  • Am I overthinking conflict detection, or underthinking it?

If folks are interested I’m happy to share a diagram or trimmed README — just didn’t want to drop a repo uninvited.

Appreciate any gut checks from people who’ve thought about LLM memory systems before.

-- Yes I used AI to help me write this, not to offend but just to be transparent.

reddit.com
u/Dry_Inspection_4583 — 2 days ago

Is OWUI native RAG good enough or should I just use Azure AI Search?

So I built an internal AI tool at work using Open WebUI as the frontend with Azure on the backend (Blob Storage, Azure OpenAI GPT-4o, Azure AI Search). The tool takes acceptance criteria for new features, does RAG over around 10,000+ indexed Gherkin test cases, and generates structured Gherkin feature files for our automation team. Works really well so far.

Now I got assigned a second project which is a department wide Q&A assistant that can search through training materials, white papers, workflows, architecture docs, onboarding content, video walkthroughs, basically everything. Potentially 100+ users across multiple teams.

The corpus is a mix of PDFs (some are lengthy white papers with architecture diagrams), DOCX, PPTX, MP4 videos (some with and some without transcripts), OneNote notebooks, and PowerShell scripts.

For the Gherkin tool I bypassed OWUI's native Knowledge Base entirely and did all retrieval through Azure AI Search from a custom Pipe. Worked great but took extra setup.

For this one I am wondering if OWUI's built in RAG is actually strong enough to handle this or if I am going to hit the usual issues people talk about like chunking problems, weak default embeddings, ChromaDB not scaling. Has anyone used it successfully at this scale with mixed format documents? We have a lot of documents.

Or should I just go Azure AI Search again from the start since I already have the infrastructure and know the pattern. Just wondering if that is overkill for what is basically a document Q&A bot.

EDIT: Seeing this WIKI LLM thing on internet, would this work for my usecase or is simple rag better?

reddit.com
u/Boring-Baker-3716 — 3 days ago

Websearch API / omlx - owu - home assistant

My goal is to use a local LLM with Home Assistant.

oMLX provides the backend, and Open WebUI acts as the API proxy to Home Assistant.

I’ve enabled web search as a default capability for the model. It works fine when testing directly in Open WebUI, but the web search is not triggered when the model is called through Home Assistant.

Is Web Search through API supported?

Any ideas how to set this up?

Thanks for suggestions

reddit.com
u/pr3ddi — 1 day ago

Need help setting something up for my sister.

So I made this post in the SillyTavern sub (https://www.reddit.com/r/SillyTavernAI/comments/1szeewu/comment/oj7kh76/).

I ended up using Open WebUI because it's closest to Claude's web interface, which she's used to. She has only used Claude so far. It was a colossal pain in the ass to set up with OpenRouter though and I had to get help from ChatGPT on how to add the models, force a certain provider that's cheaper and enable web search.

Her main AI to use is Claude.
What she wants is very, very specific, and she claims ONLY Claude can do it. The issue is Claude paid for through OpenRouter or anywhere where I can limit censorship is EXTREMELY expensive, especially considering what she wants to do.

Right now she is using GLM 5.1 because that's what I use and it's very close to Claude quality while being significantly cheaper.

Here are the problems:

Web search:

She has Claude web search a LOT.

The way she makes her stories is that she tells Claude, for example, "Look up EVERYTHING on Gachiakuta. Every single episode, character, lore, powers, settings, everything from the wiki. All of it! Make sure you have everything!"
Then once it grabs all that, she starts a story with something like "This is how Riyo and ____ met, everything before is canon and this is before _____"

The problem is web search is very expensive, especially the amount of it she does. It's fine with free Claude because it's, well free, but paying for it...
Claude is able to grab it all at once no problem, but other AI say they are limited by how much they can scrape at once, and they are also worried about "copyright" and legal issues of taking all of that data and text verbatim.

GLM 5.1, when I figured out how to enable web search, costs a LOT with what she wants to do.
In the span of 15 minutes she had spent $1.28 from all the web searches. Just giving it link after link after link from the Gachiakuta wiki for it to remember so she can do the story.

I tried to get around this by having ChatGPT compile all the data from the wiki on my end and put it in a file she can then give to the AI, but it basically refused and said that violates copyright, so it's only able to give me brief summaries of what's in the wiki, and mere lists of character names, which is useless to her.

Extremely specific:

This issue I think is just flat out impossible to solve.

She wants everything to very very closely follow the lore, character personalities, story and all that. That's why she does the web search and wiki scraping thing. If it gets something wrong about a character or plot point she gets very upset.

She has many rules for what she wants the AI to do, but can't really explain them well to me and gets frustrated when I ask.

She wants it to write stories for her, but she doesn't want it to "take control", as in it starts doing a bunch of stuff on it's own.
When she wants Riyo and someone to meet, she wants Riyo and someone to meet. She doesn't want it to throw in that farmer John in the distance yells out help because a monster or whatever is attacking his barn. She doesn't want Riyo to be like "we should go meet your sick dad" or something.

She wants it to aid her in making a story and expand on what she types and not do it's own whole thing. She wants it to do some of it's own thing, but not to steer the story too much.
She gets extremely frustrated when she gives it a bunch of text and it starts off using that but then does it's own thing for like 4 paragraphs to try and forcefully advance the story.

It's hard to explain exactly what she wants here because whenever I ask her she just yells and gets frustrated saying I "should know" what she wants, and also she doesn't know how to explain.

Claude gets it right more often because it's run by a giant megacorporation with tons of money to train it to be good in most fields, including interpreting things and understanding people like my sister. It still messes up sometimes though.
Other AI doesn't do this well. She says not even ChatGPT does this well.

Timeout and unavailable errors:

GLM 5.1 sometimes just times out and gives nothing, or sometimes just won't give a generation at all and outputs blank every once in a while. I guess because so many people are using it?

In SillyTavern this is fine, it tells me the error in the top right and I can just click to regenerate, or swipe.
With Open WebUI, the message becomes something like "Error" or "Role" and then you cannot make any more messages unless you delete it. It locks the entire chat up. Sometimes it locks it up so badly that you can't even scroll up until you get rid of all the error messages.

Arguing with the AI:

Not sure if I can do anything about this either.

She does this sometimes. She gets frustrated with it and then completely drops the story to start typing at it and arguing, and it doesn't really understand.

She'll get super frustrated and type something like "soppt" or "st[[po" and then it's all "I'm not sure what you're saying, I think you are asking for the definition of soap. Soap is a cleaning-"

This then keeps devolving with her constantly arguing with it and then it fucks up the whole thing because now it has a bunch of arguments and insults thrown at it and it will never be able to do the story now.

Claude is still the best, despite it's issues:

Everything I've tried so far, she just keeps going back to
"Claude wouldn't mess up like this"
"Claude doesn't do this stupid shit"
"Claude is better"
"Claude understands what I mean"
"Claude does what I ask"

Others are not as smart and able to understand exactly what she's saying and asking for. Claude, somehow, is trained in a way that it is very good at understanding people with her level of autism, learning disability and dyslexia.

The problem though is... Claude is WAY, WAY too expensive.

When I used Sonnet 4.5 in SillyTavern through OpenRouter, which is amazing, even without web search, it cost around $10 around every 3-4 days. Sometimes, if I kept using a long chat, it would cost $10 every 1-2 days. It's why I don't use Claude anymore. It's amazing but it's absurdly expensive.
Web search would make this WAY more expensive and not affordable at all.

I'm sure paying for Claude directly would be cheaper, but the issue with that is that it will censor her. She hates the censorship. She wants to do nsfw and other things that Claude normally will 100% block for. I don't want to jailbreak it and use an API either because then Anthropic will just ban her account and waste our money.

So this is where I'm at right now.

reddit.com
u/Dogbold — 2 days ago

LLM Wiki

Hey everyone,

I've been thinking a lot about Andrej Karpathy's idea of moving away from classic RAG and our internal outdated confluence WIKI solution toward what he calls an "LLM Wiki" for all employees in our company that could combine both layers (personal + business) in one semi-personal LLM Wiki.

Has anyone implemented something similar? A curated knowledge layer that sits between your raw documents and the LLM, rather than using OWUI's built-in Knowledge/RAG directly? Has maybe anyone built a custom MCP server or function/tool that does graph-based retrieval (following wikilinks to find related context) instead of pure vector similarity?

For those running this at company scale: How do you handle content freshness? RAG doesn't know when a chunk is outdated. The LLM Wiki approach solves this with review_cycle_days and a gardener agent, but I'm curious about other solutions.

I think the biggest gap in current OWUI RAG is quality control. If you upload a PDF and hope for the best. An LLM Wiki gives you auditable, versioned, interlinked knowledge with explicit confidence scores. But it's obviously more work to set up.

Would love to hear if anyone's gone down this path or has thoughts on whether this is overengineering vs. a real improvement over vanilla RAG.

reddit.com
u/Dimitri_Senhupen — 4 days ago

local ai on basic computer requirements

Weird that publishing system requirements is fading.

I have installed a smolm2 model on a non-nvidia laptop which is a model that can run in 8GB. I plan to upgrade to 16GB for jeffgreen311/eve-qwen-8b-consciousness-liberated to use as an assistant. I hope to install openwebui mainly for TTS. Am I dreaming or is it possible? I also hope for target implementation to access local tools such as MYSQL or SQLlite databases. Any thoughts appreciated. I doubt I will ever use more than a fraction of tool capabilities.

reddit.com
u/openwebui78 — 5 hours ago

When are we going to get proper MCP support on OpenWebUI?

All resolved now - for anyone running into this later, there is a function calling tag under advanced parameters which needs to be changed from default to native. This resolves function calling only being executed before the prompt is generated and makes OWUI work great.

Many thanks to u/overand, u/Stike_1, u/anengineerdude and everyone else who corrected me here. This is not a missing feature, just some confusion on my end.

>!I really love this platform, but it is held back significantly by the fact that it doesn't support the traditional method of MCP calls. From my observations, the calls appear to occur in a separate COT from the main response and presents the MCP response as a file source rather than something that can be requested mid-response (as the AI model realises it needs certain tool usages / data for its reasoning process). I had the same issue using the in-house python-based tool interface, where I can't (1) have my AI model make a tool call after it has started the prompt and (2) use multiple tool calls in one prompt.!<

>!It would be really great if this could be addressed in an upcoming update, seeing as this is is one of the best ways to expand AI model capabilities and is available on most other inference providers at this point. OWUI has been such an amazing thing to find and to be able to play around with, but the current 'experimental' implementation leaves a lot to be desired.!<

u/WittyAmbassador7340 — 5 days ago

Question and Answer functionality

Claude.ai is able to ask the user questions and seek clarification before generating the final response, can i implement something like this in open web ui?

reddit.com
u/Ecstax — 4 days ago

How to make chat work unattended?

so, I created an MCP server with like 14 tools. My task is bit complicated and takes like 3-4 hours easily. And goes through multiple feedback loops, I added claude API and using sonnet not local models.

If any user in chat asks to do the work, it is currently getting stopped at some point then again I had to ask to “continue” and if browser shutdown or something i had to start all over again.

What can I make it work unattended by me, continuous loops by itself and better send email notify at last once done. How to achieve this?

reddit.com
u/Upper-Advantage-6156 — 3 days ago

New QR Code Tool Created By Qwen3.6-27B

https://openwebui.com/posts/qr_code_generator_for_open_webui_fb931955

I got rate limited after 1 prompt using Claude and decided why not just create the Open-Webui tool with Open-webui itself! Used Qwen3.6-27B-Q5 and used the attached a webpage feature to give it the full docs of how to create openwebui tools, it created a fully working tool to just get any QR code created in seconds, embedded in chat and easily shareable.

It created both the code, the docs, everything, nothing special I did.

pretty cool knowing that I can someday just let the AI run and improve itself and create new tools and capabilities for itself!

Now we just need open-webui to have an ability to let the LLM enable the tools by itself when needed, to avoid needing to enable 100 different tools.

u/iChrist — 4 days ago

Anyone using Hermes with a Web UI instead of terminal?

I’ve been using Hermes with managed hosting for a while and I like the model itself but I noticed something in my own usage. When I was using it through the terminal, I’d only really open it for specific tests or structured runs. It worked fine but it didn’t feel quick to use and basically impossible to show to non tech friends.

I switched to a simple web style UI (chat interface + history) recently and my usage increased a lot without me intentionally changing anything else. It just feels more natural to open quickly.

It made me realize if the interface layer is actually a bigger factor in adoption than we usually think for agents that are otherwise CLI first.

Anyone else here has seen the same thing with Hermes or other agents. Do you end up using them more once there’s a proper UI on top, or does it not really matter in your workflow?

reddit.com
u/Zestyfar_Chat_8 — 6 days ago

How to have the model (gemma4) search the web via native tool calling without specifically choosing web search toggle within the chat?

Perhaps I'm not understanding what native tool calling setting actually does but I have searxng added as mcp, which successfully searches the web when I choose it but I was hoping the model would choose to do it by itself if it needs to when I've enabled native tool calling.

The top result is without selecting the web search toggle (in hopes it would still use it) and the bottom one I did select the web search which is my usual way of forcing it to use web.

https://preview.redd.it/1ur1oq4uboxg1.png?width=1130&format=png&auto=webp&s=8ec90a00e253f001f395b8a0489f4038b445b195

I have this model being served via lmstudio server with full 256k context. Not sure if I have to enable anything else?

https://preview.redd.it/nh1hdgz9doxg1.png?width=348&format=png&auto=webp&s=8f3375b565eabd3b4f28bf0d4d329f7c36dc4a52

reddit.com
u/chimph — 6 days ago

Web Search in OpenWebUi

I have this test prompt that I use for figuring out whether a setup can provide me with reliable information. It is "What can you tell me about the 1974 FIFA world cup final?". I find that information like the nations playing, date, location, and final score are generally accurate, but when it comes to who scored and when, it's usually a hallucination fest, if web search does not function properly. I am using llama.cpp via llama-server and Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf as the backend. Despite the fact that I see https://en.wikipedia.org/wiki/1974_FIFA_World_Cup_final in the search results returned in OpenWebUi, it always gives the wrong details, while I get a perfect answer every time from PicoClaw (a tiny OpenClaw-like agentic system), and it has been able to do this pretty much right after the install without much fuss. Not only that, PicoClaw seems to be able to tell if and when a web search is even needed in the first place, whereas OpenWebUi seems to trigger a search for every prompt, when web search is enabled. Is PicoClaw the better tool for the job, or did I just not configure OpenWebUi correctly?

Edit: thanks for all who responded! Enabling native function calling is the key. There are at least three places where this can be done:

  1. Admin Panel - > Models -> [Model Name] -> Advanced Params -> Function Calling -> native
  2. Workspace -> [Workspace Model] -> Advanced Params -> Function Calling -> native
  3. Admin Panel - > Models -> Model Parameters -> Function Calling -> native

I found that changing the setting for the base model in 1. does not automatically also change it for the derived Workspace models. The last one seems to be the default for all. The relevant pieces of the documentation can be found here.

Beyond that, prompting for accuracy and grounding the response on verifiable sources is still key for a chance to get it right in the first shot.

u/_Wheres_the_Beef_ — 6 days ago

Fixed Model Selection

How do I configure Open WebUI so that it only shows the models I select as options in the chat view?

I'm using the OpenRouter API and any time new models are added they become visible in the model selection dropdown. As a result, I'm routinely having to deselect visible models in the settings which is annoying.

Anyone aware of a fix?

reddit.com
u/softplus- — 5 days ago