u/Better-Platypus-3420

▲ 6 r/projects+4 crossposts

I built a system that uses Ollama's local embeddings to give ChatGPT, Claude, and Gemini persistent memory across chats.

Why local embeddings matter:

Instead of relying on OpenAI's embedding API, I use nomic-embed-text via Ollama. This means:

  • Zero API costs
  • No embedding data leaves your machine
  • Instant inference (runs on your GPU/CPU locally)
  • Privacy

The pipeline:

  1. Chrome extension captures conversations
  2. Backend chunks them (300-word windows, 80-word overlap)
  3. Ollama generates embeddings locally (~768 dimensions)
  4. Stores in ChromaDB (vector DB)
  5. On new prompts: embed the prompt → semantic search → inject top-3 chunks

The result:

When you ask ChatGPT a question about your project, it automatically gets context from your entire conversation history. No re-explaining. No manual effort.

Tech:

  • Chrome extension (MV3)
  • Node.js backend
  • Ollama for embeddings
  • ChromaDB for vector storage
  • Neo4j for knowledge graphs (optional but powerful)

GitHub

Works offline. MIT licensed. Self-hosted.

Would love feedback from anyone using Ollama!

u/Better-Platypus-3420 — 14 days ago