Memory just turned a goldfish into a research beast.
I've been building Nyx, a persistent memory layer for local AI, and today I got the first real benchmark numbers worth sharing.
The test: same long civic investigation task twice. Building a full politician profile, then asking follow-up questions that required remembering details established earlier. One run with Nyx active, one cold start. Same model, same hardware.
**(eTPS = Effective Tokens Per Second — measures useful output quality, not just raw speed.)**
**The difference was ridiculous:**
- **With Nyx**: 37.70 eTPS • 0.950 Continuity
- **Cold start**: 3.87 eTPS • 0.138 Continuity
- **Score jump: +84 points**
That's roughly 10x more useful output and 7x better context retention.
**Plain English:** Without memory the AI acts like a goldfish. Every message it forgets what we already established, wastes tokens reconstructing context, and loses the thread. With Nyx it remembers the whole case like it's been working on it for weeks.
The use case that made this obvious — CivicLens, an evidence-first politician research tool I'm building alongside Nyx. Long investigations spanning dozens of exchanges fall apart completely without persistent memory. With it, the session behaves like a single coherent investigation instead of disconnected queries.
Still early. Claude Code keeps going rogue and touching repos it shouldn't. But the core memory layer works and the numbers back it up.
Does anybody benchmark whether AI can actually finish a job across multiple sessions?