u/axendo

Memory just turned a goldfish into a research beast.

I've been building Nyx, a persistent memory layer for local AI, and today I got the first real benchmark numbers worth sharing.

The test: same long civic investigation task twice. Building a full politician profile, then asking follow-up questions that required remembering details established earlier. One run with Nyx active, one cold start. Same model, same hardware.

**(eTPS = Effective Tokens Per Second — measures useful output quality, not just raw speed.)**

**The difference was ridiculous:**

- **With Nyx**: 37.70 eTPS • 0.950 Continuity

- **Cold start**: 3.87 eTPS • 0.138 Continuity

- **Score jump: +84 points**

That's roughly 10x more useful output and 7x better context retention.

**Plain English:** Without memory the AI acts like a goldfish. Every message it forgets what we already established, wastes tokens reconstructing context, and loses the thread. With Nyx it remembers the whole case like it's been working on it for weeks.

The use case that made this obvious — CivicLens, an evidence-first politician research tool I'm building alongside Nyx. Long investigations spanning dozens of exchanges fall apart completely without persistent memory. With it, the session behaves like a single coherent investigation instead of disconnected queries.

Still early. Claude Code keeps going rogue and touching repos it shouldn't. But the core memory layer works and the numbers back it up.

Does anybody benchmark whether AI can actually finish a job across multiple sessions?

reddit.com
u/axendo — 10 hours ago

Tech's Push to Be the Next Public Utility

Amazon didn't ask permission to become critical infrastructure. They built AWS until enough of the economy depended on it that regulation became almost impossible. You can't turn off the internet's backbone.

Now the same playbook is running with AI and data centers.

Build the infrastructure everywhere. Create dependency at scale. Make yourself essential to healthcare, finance, government, and defense before anyone agrees you should be. Then negotiate from a position where shutting you down costs more than regulating you.

The data center fights happening in communities right now — zoning battles, water usage protests, grid capacity fights — aren't about data centers. They're about who controls the next utility layer before the rules are written.

Historical utilities — power, water, telecom — eventually got regulated because they became too essential to leave unaccountable. The window between "essential" and "regulated" is where the real money gets made.

That window is open right now.

Who should have the authority to decide whether AI infrastructure is a public utility — and what happens if we don't decide before the decision gets made for us?

reddit.com
u/axendo — 4 days ago

Jane Doe v. Bank of America (SDNY): FBI and USAO Moving to Quash Subpoenas

The picture on this case is getting clearer.

A Jane Doe plaintiff (almost certainly an Epstein victim) is suing Bank of America in the Southern District of New York in front of Judge Rakoff. Through her lawyers, she subpoenaed the FBI and the U.S. Attorney’s Office for the Southern District of Florida for records.

Interestingly, it’s not Bank of America fighting the subpoenas — it’s the government (FBI + USAO-SDFL) that filed motions to quash them.

Key details:

Amended complaint is 115 pages, filed March 5, 2026 (original filing January 2, 2026 as 1:25-cv-08520-JSR).

FBI Unit Chief William L. Harris submitted a declaration arguing why the government shouldn’t have to produce the records.

This is playing out through standard civil discovery rather than big headlines. I’m pulling the amended complaint and the motion to quash filings now.

Does this feel unusual to anyone else? The government actively resisting discovery in a civil case involving potential Epstein connections. What’s your take?

reddit.com
u/axendo — 5 days ago

eTPS Site Plan – Simple Leaderboard + What You’ll Actually See

Building on the last post, here’s what the first version of effectiveTPS will look like.

**Core display (v1):**

- Clean table comparing popular local models

- Raw TPS (the marketing number everyone shows)

- eTPS (the new metric that actually measures useful output in real conversations)

- Time to First Token (how long you wait before it starts replying)

- Effectiveness Index = (eTPS ÷ Raw TPS) × 100 — higher is better

**Example leaderboard (early test data):**

| Model | Raw TPS | eTPS | Time to First Token | Effectiveness Index |

|--------------------|---------|--------|---------------------|---------------------|

| Llama 3.1 70B | 45.2 | 38.7 | 1.4s | **86** |

| Qwen2.5-32B | 68.4 | 52.1 | 0.8s | **76** |

| Gemma 2 27B | 71.3 | 44.6 | 0.6s | **63** |

I’ve been running these tests through a structured multi-turn analysis framework I built to evaluate complex workflows. That’s how eTPS was stress-tested — not just single-turn benchmarks, but real back-and-forth sessions.

Advanced mode (toggle) will add latency percentiles, cost-per-quality, and consistency scoring later. For v1 the goal is to keep it dead simple and immediately useful, even if you’re not deep into AI.

The whole point is to cut through the noise and show which models actually deliver useful work, not just raw speed.

What do you think should be added (or removed) for the first version? Any metrics you’d want to see front-and-center?

**TL;DR:** Simple leaderboard with Raw TPS, eTPS, Time to First Token, and a clear Effectiveness Index. Advanced stuff stays hidden until you want it. Feedback welcome.

reddit.com
u/axendo — 12 days ago
▲ 62 r/Epstein

Everyone talks about the creepy mansion and flight logs at Zorro Ranch.

Almost no one talks about the professional operational backbone that kept it running smoothly for decades — even after the FBI showed up.

The Gordon Factor:

Brice and Karen Gordon managed Zorro Ranch (and Little St. James) for ~17 years (roughly 2003–2020).

In February 2007, the FBI visited the ranch and specifically interviewed Brice Gordon about flown-in “masseuses.”

Despite that interview + the 2008 Florida plea deal, the Gordons stayed in place for another 13 years.

Flight activity to the private airstrip dropped sharply after 2007, but the core infrastructure (5,000-ft paved airstrip, hangar, helipad, and the massive professionally-built mansion) remained untouched.

This wasn’t random luck. After the first real exposure, they reduced the noisy visible activity (smart de-risking) while keeping the professional staffing and physical node completely intact — exactly like the Indyke/Kahn continuity layer in the 1953 Trust.

The flashy guy could be removed. The operational spine was built to survive.

Sources:

NM AG flight records (90+ flights 1997–2006)

2007 FBI interview notes

EFTA estate and ranch management documents

What am I missing? Has anyone found similar long-term professional staffing continuity on other Epstein properties? Drop any relevant EFTA IDs or threads.

TL;DR: The Gordons ran Zorro Ranch professionally for 17 years — including 13 years after the FBI interviewed them on-site. That quiet continuity layer explains way more about how the network lasted than the usual headlines.

reddit.com
u/axendo — 13 days ago
▲ 52 r/Epstein

Most people are still arguing over names on a flight log, but if you want to know how the "machine" actually survived 2019, you have to look at the architecture of his estate.

The EFTA (Epstein Files Transparency Act) documents reveal that Epstein didn't just sign a will two days before he died—he signed a Continuity Plan.

The Smoking Gun: Section 2.5

Epstein amended The 1953 Trust (EFTA01266204) to include a specific clause that is rarely discussed. It basically mandated that his key inner circle—specifically those tied to HBRK Associates and Indyke’s entities—had to stay in their roles for two full years after his death to receive their massive bequests.

The "Service Contract" Inheritance:

Darren Indyke (Lawyer): ~$50M + co-trustee

Richard Kahn (Accountant): ~$25M + co-trustee

Karyna Shuliak (Girlfriend): ~$100M in properties/cash

Why this is chilling:

This wasn’t a man settling his affairs; it was a CEO modeling "removal as a risk." By making the money conditional on them staying in place, he ensured the professionals who handled the wires, the offshore entities, and the "Southern Trust" shell games couldn't just walk away when the heat got turned up.

He didn't just leave them money; he bought their silence and their labor for the exact window of time needed to stabilize the network. Even now, in 2026, Indyke and Kahn remain the sole executors.

The Receipts:

The 1953 Trust (32 pages): EFTA01266204

The Feb 4 Amendment: EFTA00128921

The network was built to survive the node. While we were looking for a "client list," they were busy executing the "continuity clause" to keep the lights on.

If the estate was legally engineered to keep humming for years after his death, was the "operation" ever actually shut down, or did it just change management?

reddit.com
u/axendo — 13 days ago

We're obsessed with raw tokens per second. Every hardware post leads with it. Every quantization comparison is ranked by it. It's the one number everyone agrees to report.

It's also measuring the wrong thing.

Raw TPS tells you how fast tokens hit the screen. It tells you almost nothing about how quickly you get a correct, usable answer. On sustained, multi-turn workflows, that gap becomes massive.

A faster model that hallucinates, requires multiple corrections, and forgets context you gave it earlier can easily be less useful than a slower model that gets it right the first time.

eTPS (Effective Tokens Per Second) is a complementary metric that measures actual progress toward a useful answer, not just token throughput.

The basic idea: weight the final accepted output by how clean the path to that answer was — first-pass correct scores highest — then divide by total time. Correction loops, hallucinations, and repeated explanations all reduce the score. A response that never reaches a correct answer scores zero regardless of speed.

It doesn't replace raw TPS. It sits next to it.

Results — same prompt, four runs, same hardware:

  • gemma-4-e2b (4.6B): 53.2 raw TPS → eTPS 53.18 ✓
  • qwen3.5-0.8b: 173.1 raw TPS → eTPS 86.57 ✗ partial
  • qwen3.5-9b (optimized): 1.8 raw TPS → eTPS 1.78 ✓
  • qwen3.5-9b (baseline): 0.5 raw TPS → eTPS 0.32 ✗ partial

The 0.8B leads on raw speed by a wide margin and still lost. Raw TPS said it won. eTPS said it didn't.

Hardware: RTX 5060 Laptop, 8GB VRAM. eTPS scores aren't portable across hardware — always report your full setup.

Known limitations (v0.1):

  • Scoring requires human judgment. The line between "needed clarification" and "was factually wrong" isn't always clean. Code generation with objective pass/fail criteria is a cleaner target and the focus of the next benchmark run.
  • One task isn't representative of sustained multi-turn workflows — that's where the metric gets most interesting and where I'm headed next.
  • Easy to game without full system prompt logging. The spec will require it.

These are acknowledged constraints, not hidden flaws.

Full specification coming soon covering methodology, task library, scoring protocol, and reproducibility standards. Before I lock the final weights I'd genuinely like input on two open questions:

How should the penalty differ between a model that confidently states something false versus one that's just vague enough you had to ask a follow-up? And should hardware normalization live in the core formula or be reported separately?

Thoughts welcome.

reddit.com
u/axendo — 13 days ago