u/Final-Data-1410

▲ 15 r/LangChain+1 crossposts

Been working on an alternative to float32 vector databases

for RAG pipelines.

The core problem: standard RAG expands your documents 10×

in size and needs an expensive managed vector DB running 24/7.

My approach — convert each float32 embedding into a 128-byte

binary fingerprint, then search using Multi-Index Hashing (MIH)

with Hamming distance instead of cosine similarity.

Results (measured at >100k chunks):

• 32× smaller index vs float32 RAG (BGE-M3, 1024-bit)

• 96× smaller at BGE-base + 256-bit

• 75× faster search — pure POPCNT arithmetic, no GPU

• R@10 = 1.000 on 500K-chunk benchmark — matches exact float32

• Beats FAISS Fixed Binary on every same-size BEIR cell

• Runs completely offline from a single .pkl file

• No Pinecone, no Weaviate, no Qdrant needed

Honest caveats:

• Float32 cosine still wins on raw recall at small corpus

sizes — gap closes as corpus grows, matches perfectly

at 500K chunks

• Below ~100k chunks the 75× speed advantage disappears —

both methods hit ~1ms floor and perform the same

Full benchmarks, roadmap, live demo and all downloadable indexes:

github.com/QLNI/NodeMind

---

What's next: HiveMind.

NodeMind solves storage. HiveMind is the next layer — agents

deposit compressed reasoning traces, register semantic watches

on ideas, and connect through MCP. A reasoning node is 32 bytes

vs 4 KB float32. A 1M-node graph fits in 32 MB instead of 4 GB.

Concept → https://nodemind.space/hivemind/

Follow @QLNI_AI — one post when it ships/updates.

Happy to answer any questions or doubts about the benchmark

results.

— Sai

reddit.com
u/Final-Data-1410 — 9 days ago
▲ 25 r/LangChain+1 crossposts

Been working on an alternative to float32 vector databases for RAG pipelines.

The core problem: standard RAG expands your documents 10× in size and needs

an expensive managed vector DB running 24/7.

My approach — convert each float32 embedding into a 128-byte binary fingerprint,

then search using Multi-Index Hashing (MIH) with Hamming distance instead of

cosine similarity.

Results (measured at >100k chunks):

• 48× smaller index vs float32 RAG

• 75× faster search — pure POPCNT arithmetic, no GPU

• Runs completely offline from a zip file

• No Pinecone, no Weaviate, no Qdrant needed

Honest caveats:

• On small corpora (<10k chunks) compression is ~31× due to fixed MIH

sub-table overhead — fully amortises at production scale

• Speed gap collapses below ~100k chunks where both methods hit ~1ms floor

• 100× image compression is a projection, not yet in production

Live demo: [nodemind.space](https://nodemind.space)

GitHub: [github.com/QLNI/NodeMind](https://github.com/QLNI/NodeMind)

X/Twitter: [Follow @Qlnix4E49 for updates](https://x.com/Qlnix4E49)

Two provisional patents filed in Australia. Built solo on community hardware

in regional NSW.

Happy to answer technical questions about the MIH architecture or binary codec.

Full benchmark now live on GitHub

500,000 chunks — Wikipedia + arXiv + Project Gutenberg books. Both NodeMind and float32 RAG indexes are downloadable so you can verify the compression ratios yourself.

➡️ github.com/QLNI/NodeMind

reddit.com
u/Final-Data-1410 — 11 days ago