u/clawvault

▲ 6 r/u_clawvault+2 crossposts

Last Tuesday Anthropic released Dreams for Claude Managed Agents. It's a memory cleanup pipeline: feed it a memory store and up to 100 session transcripts, get back a new store with duplicates merged and stale entries replaced. The same week, building ClawVault (a life-admin agent for busy parents), we shipped a 12-skill self-improvement wire.

We were solving the capture half of the problem: every UI edit a parent makes writes a learning to a per-owner journal. Anthropic was solving the consolidation half: clean up the journal once it gets messy. When I read their docs, three things they got right made me notice things we'd missed.

Memory versions. Every mutation in a Managed Agents memory store creates an immutable version with 30-day retention. There's a redact endpoint for compliance. There's optimistic concurrency via SHA-256 preconditions. We have none of that. Our .learnings/ files are bare markdown in a per-owner GCS volume. If a learning leaks PII, we can edit the file, but the previous version is gone. If two writes race during a task, one wins silently. We need versioning and we don't have it.

The 100KB per-memory cap. Anthropic's docs say to structure memory as many small focused files, not a few large ones. We don't enforce a cap. Our health-companion.mdcould grow to 50MB if someone hammered the wire. The cap isn't arbitrary. Forcing small files makes consolidation tractable and audit visible.

Read-only vs read-write access modes. Memory stores attach to sessions with an explicit access mode. The docs warn about prompt injection writing to memory: a successful injection in one session corrupts every session that reads that store afterward. Our agent has full read-write on every .learnings/ file. We've been lucky. We need to think harder about read-only mounts for shared reference material versus read-write for active learning.

The thing I keep coming back to: Anthropic and I converged on the same hard rule from opposite directions. Their Dreams output is a new memory store, never modifying the input. Our self-improvement skill is append-only with Status: superseded for stale entries. Both of us locked in input-immutable journals before we'd seen each other's work. The pattern is universal. When a model curates a journal, the journal has to stay auditable.

We can't use Dreams directly. Our architecture rule blocks direct Anthropic API calls. Even if it didn't, we don't have Anthropic-side memory_store_id or session_id primitives to pass. So the integration plan is an inspired re-implementation: a sibling skill dream-consolidator running on a Cloud Run cron, reading our existing .learnings/ files through the OpenClaw gateway we already use, writing a new consolidated file alongside the raw entries. About 4-5 hours of work.

But not yet. We just shipped the wire that creates the journals. Until I have a week of production traffic to confirm the redact discipline holds, dreaming is premature. Consolidating a journal full of PII leaks would amplify the leak. The wire goes first. The cleanup pass comes later.

If you're shipping agent products, the lesson from both architectures is the same. You need a capture pipe and a consolidation pass. You need version history. You need access modes. Don't skip the boring infrastructure.

u/clawvault — 8 days ago

Building this in public. Feedback genuinely welcome — especially the parts you think are wrong.

Been reading budget-app posts on this sub and a few others for a while. The same complaints surface over and over. Plaid breaks every Tuesday. Your category corrections get wiped on reconnect. "Auto-cancel" features send templated emails and call it done. RocketMoney pockets 40% of your bill negotiation savings. People aren't quitting these apps because they got bored. They're quitting because the apps fail at the thing they exist to do.

A thought I keep coming back to. The "best" budgeting app isn't the one with the most features. It's the one with the right ones, the ones that actually change behavior. Most apps are either inaccurate, too rigid, or too much work to maintain, and any one of those is enough to kill the habit.

So the build I started has two architectural calls underneath everything. The app does things on your behalf with approval gates instead of just visualizing your problems back at you. And long-term memory: it remembers what you've corrected, what categories you trust, what merchants behave weirdly, across years instead of relearning you every month.

Here's what's in it so far.

Drop a screenshot of your bank app and the transactions get extracted, categorized, and deduped against what's already in your vault. No Plaid. No reconnect-every-Tuesday loop. No waiting for institutional support. The same path works for PDFs, CSVs, and manual entry, so a $40 cash haircut shows up in the same shape as anything else.

Cancellation that actually cancels. The app logs into the merchant, talks to retention if it has to, takes a screenshot before every irreversible action, and waits for your approval. No "we tried, sorry" email. The button does what the button says.

Once you correct a category, it stays corrected. Recategorize Tim Hortons once and the app remembers. Most apps relearn you from scratch every month. That's a design choice, and it's the wrong one.

Bill negotiation, rolling out with a different pricing model than the incumbents. The plan is $5 flat if it works, $0 if it doesn't. RocketMoney keeps 40% of whatever they save you. Same negotiation, different math.

When negotiation won't help because you're already at the floor, the next layer finds a cheaper carrier and any affiliate fee gets split with you. Coming, not shipped.

One nudge a day, not a wall of charts. Every morning, one thing worth doing today. "Three subs unused 60+ days, $42/mo" or "Bell charged $12 more than usual, worth a look?" One decision is a feature. Ten dashboards is noise.

Voice in and out, optional, off by default. The use case is asking "can I spend $200 on groceries today" while your hands are on the cart. But not everyone wants their balance read aloud at the checkout, so it stays off unless you turn it on.

The architecture call I want to flag because it's load-bearing. Every user gets their own encrypted container, their own graph database, their own secrets namespace. Cross-user data leakage requires multiple independent failures by design. The app can't sell aggregated data because the data structurally isn't aggregated. It costs more to run this way and that's the point.

Happy to answer anything technical or get pushed on what's still missing. Not posting a link because I want the conversation more than the click.

reddit.com
u/clawvault — 17 days ago

Building this in public. Feedback genuinely welcome — especially the parts you think are wrong.

Been reading budget-app posts on this sub and a few others for a while. The same complaints surface over and over. Plaid breaks every Tuesday. Your category corrections get wiped on reconnect. "Auto-cancel" features send templated emails and call it done. RocketMoney pockets 40% of your bill negotiation savings. People aren't quitting these apps because they got bored. They're quitting because the apps fail at the thing they exist to do.

A thought I keep coming back to. The "best" budgeting app isn't the one with the most features. It's the one with the right ones, the ones that actually change behavior. Most apps are either inaccurate, too rigid, or too much work to maintain, and any one of those is enough to kill the habit.

So the build I started has two architectural calls underneath everything. The app does things on your behalf with approval gates instead of just visualizing your problems back at you. And long-term memory: it remembers what you've corrected, what categories you trust, what merchants behave weirdly, across years instead of relearning you every month.

Here's what's in it so far.

Drop a screenshot of your bank app and the transactions get extracted, categorized, and deduped against what's already in your vault. No Plaid. No reconnect-every-Tuesday loop. No waiting for institutional support. The same path works for PDFs, CSVs, and manual entry, so a $40 cash haircut shows up in the same shape as anything else.

Cancellation that actually cancels. The app logs into the merchant, talks to retention if it has to, takes a screenshot before every irreversible action, and waits for your approval. No "we tried, sorry" email. The button does what the button says.

Once you correct a category, it stays corrected. Recategorize Tim Hortons once and the app remembers. Most apps relearn you from scratch every month. That's a design choice, and it's the wrong one.

Bill negotiation, rolling out with a different pricing model than the incumbents. The plan is $5 flat if it works, $0 if it doesn't. RocketMoney keeps 40% of whatever they save you. Same negotiation, different math.

When negotiation won't help because you're already at the floor, the next layer finds a cheaper carrier and any affiliate fee gets split with you. Coming, not shipped.

One nudge a day, not a wall of charts. Every morning, one thing worth doing today. "Three subs unused 60+ days, $42/mo" or "Bell charged $12 more than usual, worth a look?" One decision is a feature. Ten dashboards is noise.

Voice in and out, optional, off by default. The use case is asking "can I spend $200 on groceries today" while your hands are on the cart. But not everyone wants their balance read aloud at the checkout, so it stays off unless you turn it on.

The architecture call I want to flag because it's load-bearing. Every user gets their own encrypted container, their own graph database, their own secrets namespace. Cross-user data leakage requires multiple independent failures by design. The app can't sell aggregated data because the data structurally isn't aggregated. It costs more to run this way and that's the point.

Happy to answer anything technical or get pushed on what's still missing. Not posting a link because I want the conversation more than the click.

reddit.com
u/clawvault — 17 days ago