u/Better-Platypus-3420 — reddlx

I built a system that uses Ollama's local embeddings to give ChatGPT, Claude, and Gemini persistent memory across chats.

Why local embeddings matter:

Instead of relying on OpenAI's embedding API, I use nomic-embed-text via Ollama. This means:

Zero API costs
No embedding data leaves your machine
Instant inference (runs on your GPU/CPU locally)
Privacy

The pipeline:

Chrome extension captures conversations
Backend chunks them (300-word windows, 80-word overlap)
Ollama generates embeddings locally (~768 dimensions)
Stores in ChromaDB (vector DB)
On new prompts: embed the prompt → semantic search → inject top-3 chunks

The result:

When you ask ChatGPT a question about your project, it automatically gets context from your entire conversation history. No re-explaining. No manual effort.

Tech:

Chrome extension (MV3)
Node.js backend
Ollama for embeddings
ChromaDB for vector storage
Neo4j for knowledge graphs (optional but powerful)

GitHub

Works offline. MIT licensed. Self-hosted.

Would love feedback from anyone using Ollama!