r/Rag

▲ 6 r/Rag

replaced my RAG pipeline with a memory layer and my agent actually got smarter over time

been building an agent that runs autonomously (openclaw loop, every 30 min). classic setup — vector db, chunk + embed documents, retrieve top-k on every query.

problem was my agent kept re-learning the same stuff. it would extract that "user prefers dark mode" from a conversation, embed it, and then next session extract it again from a different conversation. after 2 weeks my vector db had like 40 near-duplicate chunks about dark mode preferences.

i also noticed something weird — my agent was great at recalling facts but terrible at recalling how it did things. like if it successfully debugged a deployment issue through 5 steps, that workflow was gone next session. RAG only gave back fragments, not the full sequence.

ended up ripping out the whole chunking pipeline and replacing it with something that separates memory into types — facts (user likes X), events (meeting happened on tuesday), and procedures (here's how I fixed the deploy). the procedures part is what surprised me most. the agent now reuses its own workflows and they actually improve over time as it encounters variations.

i know this isn't traditional RAG but figured this sub would appreciate the comparison since i came from a pure RAG setup. anyone else experimenting with structured memory vs pure vector retrieval?

reddit.com
u/No_Advertising2536 — 4 hours ago
▲ 18 r/Rag

Where Is “Zero-Hallucination” RAG Actually Required in Production?

I’m exploring building a commercially licensed RAG system for high-stakes, regulated domains where the cost of being wrong is far higher than the cost of abstaining.

The goal is strict faithfulness: near-zero hallucination, and responses that are always grounded in verifiable citations (or no answer at all).

Typical in-house RAG setups don’t seem sufficient for this level of reliability, especially in areas like insurance, healthcare, or legal.

For those who’ve worked in such environments:

  • Which domains actually need this level of rigor?
  • Where have you seen real pain from hallucinations or weak retrieval?
  • Any specific use cases where “answer only if provably correct” would be a game changer?

Looking for practical insights more than theoretical ideas.

reddit.com
u/EnvironmentalFix3414 — 14 hours ago
Looking for Community help testing/breaking/improving a memory integrated Ai hub
▲ 3 r/ollama+1 crossposts

Looking for Community help testing/breaking/improving a memory integrated Ai hub

I was going to use Ai to write this post but I thought would be best to write it myself, so forgive my spelling and grammar mistakes 😬.

I’ve been fixated on Ai memory for the past few years, after countless failed attempts and rag reskins I finally designed something new “Viidnessmem and Mimir” (you may have seen my post about Mimir a few weeks ago).

I wanted to make somewhere that’s simple to use, completely free and local for anyone to use without the hassle of figuring out how to set up my system, this lead to Mimirs Memory Hub, a open sourced fully local ai agent hub designed to work with any existing framework you may already use (Ollama, Vllm, APIs, local gguf with llama.cpp, and more), the aim of this hub is to bring opensource ai to everyone with a community driven project “built for the community, by the community”. I'm currently looking for anyone who’d be interested in testing/breaking/improving this hub.

Now, for anyone still reading that's interested in the technical side, here's a brief overview of what makes Mimir's Memory Hub different:

The Memory System (Mimir)

Memory isn't a vector database dump. Every memory has 34 fields including emotion, importance, stability, encoding mood, novelty score, narrative arc position, drift history, and more.

Memory lifecycle:

  1. Encoding: new memories are scored for novelty (compared to last 20 memories), deduplicated (Jaccard ≥ 0.55 = merge), checked for flashbulb conditions, and indexed in both a BM25 inverted index and a semantic embedding index
  2. Consolidation: Huginn (pattern detection) runs every ~15 memories, Muninn (merge/prune/strengthen) runs periodically, gist compression kicks in after 90 days
  3. Recall: 5-stage hybrid retrieval: BM25 keyword → semantic search → spreading activation through the memory graph → mood-congruent filtering → composite reranking
  4. Decay: exponential decay based on spaced-repetition stability. Each time a memory is accessed with sufficient spacing (≥12 hours), stability grows by ×1.8 with diminishing returns. Cap at 180 days
  5. Death: memories below 0.01 vividness are archived to the "attic" (recoverable, not deleted)

Special memory types:

  • Flashbulb: high arousal (≥0.6) + high importance (≥8) = locked in with 120-day stability floor and 85% minimum vividness. Like how you remember exactly where you were on 9/11
  • Anchored: identity-level foundational memories. 90-day stability floor, 30% vividness floor. Never fully fade
  • Cherished: sentimental favourites, decay-resistant
  • Gist: after 90 days, non-protected memories compress to first 15 words

Retrieval scoring weights:

  • 30% BM25 keyword match
  • 30% semantic similarity (all-MiniLM-L6-v2, 384-dim vectors)
  • 20% vividness (decayed importance)
  • 10% mood congruence (you recall happy memories when happy)
  • 10% recency (5-day half-life)
  • Plus bonuses for cherished (×1.1), temporal relevance, visual memories, primed memories, spreading activation discoveries

Other systems like Rag/Letta/Mem0 ect are planned to be added as standalone systems or additional memory, but currently Mimir is the default.

Neurochemistry Engine (5 Neurotransmitters)

Real-time simulation of 5 chemicals that actually affect behaviour:

Chemical Baseline Decay Rate What It Controls
Dopamine 0.50 Fast (20min) Memory encoding strength (±30% importance)
Cortisol 0.30 Slow (46min) Attention width, flashbulb triggering (>0.70), Yerkes-Dodson performance curve
Serotonin 0.60 Very slow (69min) Mood stability — low serotonin = moods stick, high = moods pass quickly
Oxytocin 0.40 Moderate (35min) Social memory encoding boost (up to +40%)
Norepinephrine 0.50 Fastest (17min) Alert attention — high NE = more focused, low NE = better consolidation

10 event types trigger specific chemical profiles: surprise_positive, surprise_negative, conflict, warmth, novelty, resolution, achievement, loss, humor, stress.

Mood System (PAD Model)

42 emotion labels mapped to 3D vectors: Pleasure-Arousal-Dominance. Mood updates via exponential moving average (α = 0.3 × serotonin-adjusted decay). Real-time tracking with persistent mood history and trajectory analysis (improving/declining/stable, variability detection, breakthrough patterns).

Mood-reactive UI: 46 emotions mapped to HSL accent colors. The entire UI shifts color smoothly in real-time as the AI's mood changes.

Presets & How They Use Memory

Mimir's Memory Hub comes with 6 preset modes, each designed to get the most out of Mimir for those use cases.

Preset Memory Focus Chemistry Key Tags
Companion Emotional bonds, social impressions, cherished moments ✅ On <remember><cherish><social>&lt;remind&gt;
Agent Tasks, solutions, lessons learned, artifacts Off &lt;task&gt;<solution>&lt;remind&gt;
Character Full emotional range, narrative arcs, dreaming ✅ On <remember><cherish>, all emotion tags
Writer Story tracking, chapters, characters, world rules ✅ On <remember>&lt;task&gt;, creative memory
Assistant Appointments, notes, files, daily planning Off &lt;task&gt;&lt;remind&gt;<solution>
Custom User-configured ✅ On All available

Companion uses high emotion weight (0.8), social priority, and neurochemistry to build genuine relationships. Tracks people you mention, remembers feelings, cherishes meaningful moments.

Agent uses low emotion weight (0.2), task priority, 21 tools (file r/W, shell, code execution, web search, HTTP requests, screenshots, clipboard, etc.), and solution pattern matching. Learns from past failures via the Zeigarnik-boosted lesson system.

Character maxes emotion weight (1.0) for full immersive roleplay. The AI's mood genuinely influences responses, chemistry creates real emotional dynamics, and the rage quit mechanic means sustained negativity causes the AI to walk out.

Writer balances creativity (0.5 emotion) with project tracking. Remembers your story's characters, plot threads, chapters completed, world rules, and writing style.

Assistant is pure utility (0.15 emotion) with full tool access for appointments, reminders, file management, and daily planning.

Platform Features

10 LLM backends: Ollama, OpenAI, Anthropic, Google, OpenRouter, vLLM, OpenAI-Compatible, Custom, Local GGUF (llama-cpp-python), HuggingFace Transformers (SafeTensors GPU)

21 tools for Agent/Assistant: file read/write/search/grep, web search (DuckDuckGo or SearXNG), fetch pages, HTTP requests, shell exec, Python code execution, screenshot, clipboard, system info, diff, PDF read, CSV query, regex replace, weather, date/time, JSON parse, open apps

MCP support: Model Context Protocol with stdio and SSE transports. Auto-discovers tools from connected servers.

Vision: VL model detection (llava, moondream, qwen-vl, etc.), mmproj/CLIP for GGUF models, BLIP fallback text description for non-vision models

TTS: Edge TTS (free, many voices), HuggingFace Maya1 (GPU local), llama-server GGUF. Per-agent voice override. Browser SpeechSynthesis fallback.

STT: faster-whisper with push-to-hold mic button. Model sizes from tiny to large-v3.

Multi-agent chat: Multiple agents in one conversation. Three turn modes (address by name, sequential, all respond). Three view modes (combined, tabs, columns).

Character/Agent editor: Full creation interface + SillyTavern character card import (single or bulk). Per-agent model, backend, voice, and preset override. Isolated memory per agent.

8 visualizations: Yggdrasil graph, memory landscape, mood timeline, cherished wall, neurochemistry chart, relationships graph, topic clusters, memory attic.

See repo: Kronic90/Mimirs-Memory-Hub: Mimir's Memory Hub - multi-agent AI chat with persistent memory and SillyTavern compatibility for more info.

u/Upper-Promotion8574 — 10 hours ago
▲ 1 r/Rag

Provenance is what people ask for after a document case gets messy

Something I keep noticing: teams talk about provenance only after a case gets disputed internally.

Before that, the workflow is often fine with just extracted output. After that, everyone wants to know which file was used, whether a revised version arrived later, what changed, and what the reviewer actually saw.

What breaks

  • Revised files are not linked clearly to earlier versions
  • Structured output is kept, but the path that produced it is thin
  • Ops and engineering end up holding different fragments of the story

What I’d do

  • Preserve relationships between current and prior document versions
  • Keep field-to-page context for flagged cases
  • Record routing and reviewer outcomes in a way people can inspect later

Options shortlist

  • Version-aware storage plus internal review UI
  • Extraction tools that retain field context
  • Separate lineage tracking before approval or downstream posting
  • Lightweight case history views for reviewers and ops

I don’t think provenance has to mean collecting endless logs. It just has to mean the workflow keeps enough evidence to support internal review without making people reconstruct the timeline from memory.

Happy to be corrected if others have found a simpler pattern.

reddit.com
u/Careless_Diamond7500 — 21 hours ago
▲ 0 r/Rag

Mixed document packs probably need triage before deeper extraction

A lot of document workflows seem to assume each file is a clean, self-contained unit.

In reality, many ops teams receive mixed packs: invoice + receipt + cover letter, or KYC form + ID + supporting page. When all of that goes into one extraction path unchanged, confusion starts early.

What breaks

  • Supporting pages get treated like primary pages
  • Partial packets are handled as if they’re complete
  • Reviewers spend time figuring out page role before they can judge the output

What I’d do

  • Add a lightweight page/document triage step first
  • Preserve packet structure so the workflow knows which pages belong together
  • Route unclear packs into review before forcing full schema mapping

Options shortlist

  • Document classification before extraction
  • Page segmentation plus a packet-aware schema layer
  • Reviewer triage queues for mixed submissions
  • General OCR pipelines only for the cleaner, simpler portion of intake

My take is that many teams try to solve this by making extraction logic more complex, when the real fix is earlier intake discipline.

Would love to hear how others handle packet structure without turning the workflow into a giant custom rules maze.

reddit.com
u/Careless_Diamond7500 — 21 hours ago
Week