u/Dry_Inspection_4583

PRE: I wasn't going to share here, as I don't believe the end-user critique would be relevant(yet), and was hoping to get a more technical lens on this. It's functional and working in my own stack and I've made significant gains in useability and reliability.

the post was removed where I originally wanted to put it :( sorry, you're second.

I’ve been hacking on a side project to fix something that’s always bugged me with local assistants:

They either forget obvious things (like where I live), or they “remember” by pulling vaguely similar text and hoping it’s right.

I wanted memory to behave a bit more like a human’s:

conversation context is short‑term
only some things become long‑term memory
long‑term memory should be facts, not chunks of chat
new facts shouldn’t silently overwrite old ones

So I built a small memory layer that sits in front of OpenWebUI.

At a high level:

conversations stay short‑term
anything that looks like a fact gets extracted into a simple structured form
that fact is checked (basic rules + conflicts)
if it passes, it’s stored long‑term in Postgres
vectors are only used as a “this might be relevant” hint, never as the source of truth

Postgres is the authority, Qdrant can be rebuilt any time, and memory is strictly per‑user.

Concrete payoff example: Without this, you get:

With it, location was already validated and stored earlier, so the model can just answer.

I’m not really looking for users yet — more interested in architectural pushback:

Is drawing a hard line between “facts” and vector recall reasonable?
Does using a relational store as the memory authority make sense here?
Where would this break in practice?
Am I overthinking conflict detection, or underthinking it?

If folks are interested I’m happy to share a diagram or trimmed README — just didn’t want to drop a repo uninvited.

Appreciate any gut checks from people who’ve thought about LLM memory systems before.

-- Yes I used AI to help me write this, not to offend but just to be transparent.

Trying a different approach to “memory” in OpenWebUI — looking for sanity checks