u/Fit_Expression3641

▲ 1

Built a Real-Time Agentic RAG for an AI Streamer (LanceDB RAM-Disk, Hybrid Search, Knowledge Graph & Semantic Deduplication)

Hey r/Rag,

I’ve been developing an autonomous AI streamer/avatar and quickly realized that standard "vanilla" vector databases aren't enough to handle real-time context and long-term memory without hallucinations during multi-hour live broadcasts.

I ended up building a custom Agentic RAG engine using LanceDB as the core. Wanted to share the architecture and the specific features I’ve implemented to keep the character consistent.

Here is what's running under the hood right now:

1. Hybrid Search + Reranking (RRF) Relying solely on dense vectors was causing precision issues. Every query now hits both vector indices (ANN HNSW) and Full-Text Search (FTS) simultaneously. The results are merged using Reciprocal Rank Fusion (RRF) and passed through a BGE-Reranker cross-encoder to strictly filter out false positives.

2. Knowledge Graph Integration Instead of just chunking text, the engine extracts Entities and Edges. For complex context queries, the system performs a "multi-hop" graph traversal to find relationships between distant concepts that a standard semantic search would easily miss.

3. Agentic "Dream" Phase The AI doesn't just write raw chat logs to the database. There is a background orchestrator daemon that triggers a "Dream" phase. It asynchronously analyzes recent dialogues, extracts new facts, updates graph relations, and rewrites summary documents to prevent memory rot.

4. Hierarchical Wiki Core lore, stream rules, and personality guidelines are stored in a parent-child tree structure. The AI can autonomously read and update this Wiki during its Dream phase.

5. RAM-Disk Execution with Smart Sync To achieve near-zero latency on live streams, the entire LanceDB instance runs entirely in RAM. I built a background "Rsync-style" incremental synchronizer that safely commits diffs to the SSD periodically to prevent data loss on crashes.

6. Universal Isolated Collections To prevent hallucinations in specific domains, I isolated regular chat memory from structured data. The RAG has dedicated semantic collections for things like product catalogs and specific software rules, allowing the AI to query strict factual data when needed.

7. Semantic Deduplication If the AI learns a fact it already knows, it doesn't clone the vector. The system uses a similarity threshold during ingestion: if it finds a near-exact match, it simply updates the timestamp and boosts the importance_score of the existing record. The more a topic is discussed, the higher its priority becomes.

8. Real-time Viewer "Karma" System The RAG is directly tied to the stream chat. It tracks individual viewers, their donations, and assigns "vector tags" (e.g., "toxic", "generous") so the avatar dynamically alters its attitude toward specific users based on their historical behavior.

Would love to hear your thoughts on combining Graph and Vector retrieval for live agents, or if anyone else is tackling real-time RAG for interactive avatars!

reddit.com
u/Fit_Expression3641 — 8 hours ago