
u/Final-Data-1410

Been working on an alternative to float32 vector databases
for RAG pipelines.
The core problem: standard RAG expands your documents 10×
in size and needs an expensive managed vector DB running 24/7.
My approach — convert each float32 embedding into a 128-byte
binary fingerprint, then search using Multi-Index Hashing (MIH)
with Hamming distance instead of cosine similarity.
Results (measured at >100k chunks):
• 32× smaller index vs float32 RAG (BGE-M3, 1024-bit)
• 96× smaller at BGE-base + 256-bit
• 75× faster search — pure POPCNT arithmetic, no GPU
• R@10 = 1.000 on 500K-chunk benchmark — matches exact float32
• Beats FAISS Fixed Binary on every same-size BEIR cell
• Runs completely offline from a single .pkl file
• No Pinecone, no Weaviate, no Qdrant needed
Honest caveats:
• Float32 cosine still wins on raw recall at small corpus
sizes — gap closes as corpus grows, matches perfectly
at 500K chunks
• Below ~100k chunks the 75× speed advantage disappears —
both methods hit ~1ms floor and perform the same
Full benchmarks, roadmap, live demo and all downloadable indexes:
github.com/QLNI/NodeMind
---
What's next: HiveMind.
NodeMind solves storage. HiveMind is the next layer — agents
deposit compressed reasoning traces, register semantic watches
on ideas, and connect through MCP. A reasoning node is 32 bytes
vs 4 KB float32. A 1M-node graph fits in 32 MB instead of 4 GB.
Concept → https://nodemind.space/hivemind/
Follow @QLNI_AI — one post when it ships/updates.
Happy to answer any questions or doubts about the benchmark
results.
— Sai
Been working on an alternative to float32 vector databases for RAG pipelines.
The core problem: standard RAG expands your documents 10× in size and needs
an expensive managed vector DB running 24/7.
My approach — convert each float32 embedding into a 128-byte binary fingerprint,
then search using Multi-Index Hashing (MIH) with Hamming distance instead of
cosine similarity.
Results (measured at >100k chunks):
• 48× smaller index vs float32 RAG
• 75× faster search — pure POPCNT arithmetic, no GPU
• Runs completely offline from a zip file
• No Pinecone, no Weaviate, no Qdrant needed
Honest caveats:
• On small corpora (<10k chunks) compression is ~31× due to fixed MIH
sub-table overhead — fully amortises at production scale
• Speed gap collapses below ~100k chunks where both methods hit ~1ms floor
• 100× image compression is a projection, not yet in production
Live demo: [nodemind.space](https://nodemind.space)
GitHub: [github.com/QLNI/NodeMind](https://github.com/QLNI/NodeMind)
X/Twitter: [Follow @Qlnix4E49 for updates](https://x.com/Qlnix4E49)
Two provisional patents filed in Australia. Built solo on community hardware
in regional NSW.
Happy to answer technical questions about the MIH architecture or binary codec.
Full benchmark now live on GitHub
500,000 chunks — Wikipedia + arXiv + Project Gutenberg books. Both NodeMind and float32 RAG indexes are downloadable so you can verify the compression ratios yourself.
➡️ github.com/QLNI/NodeMind