
I built a system that uses Ollama's local embeddings to give ChatGPT, Claude, and Gemini persistent memory across chats.
Why local embeddings matter:
Instead of relying on OpenAI's embedding API, I use nomic-embed-text via Ollama. This means:
- Zero API costs
- No embedding data leaves your machine
- Instant inference (runs on your GPU/CPU locally)
- Privacy
The pipeline:
- Chrome extension captures conversations
- Backend chunks them (300-word windows, 80-word overlap)
- Ollama generates embeddings locally (~768 dimensions)
- Stores in ChromaDB (vector DB)
- On new prompts: embed the prompt → semantic search → inject top-3 chunks
The result:
When you ask ChatGPT a question about your project, it automatically gets context from your entire conversation history. No re-explaining. No manual effort.
Tech:
- Chrome extension (MV3)
- Node.js backend
- Ollama for embeddings
- ChromaDB for vector storage
- Neo4j for knowledge graphs (optional but powerful)
Works offline. MIT licensed. Self-hosted.
Would love feedback from anyone using Ollama!
u/Better-Platypus-3420 — 14 days ago