u/hushenApp

A few months ago I noticed something stupid.

I was paying AI agents to forget.

They would read a file, do some work, lose the thread, read it again, run a command, dump half the terminal into the context, then ask for more information that was already there five minutes ago.

And I just kept thinking:

This cannot be the future.

Not because the models are bad. They are often amazing. Sometimes annoyingly amazing.

But the way we feed them context is messy.

We give them too much.
Then not enough.
Then the wrong thing.
Then the same thing again.
Then a giant log file as dessert.

At some point I stopped complaining and started building.

That became LeanCTX.

The first version was basically me trying to stop the bleeding. Cache repeated reads. Compress shell output. Give the model a smaller version of files when a smaller version is enough. Keep the useful parts of context alive across sessions.

Then the project started growing.

People used it.
People broke it.
People complained.
People sent weird edge cases.
People told me when my “optimization” was actually making the agent worse.

That last part was important.

Because it forced me to admit that token savings alone are a bad religion.

A smaller context is not automatically a better context.

If the model needs the full diff, give it the full diff.
If it only needs signatures, don’t send the whole file.
If a log has one useful error, don’t send 10,000 lines of emotional damage.

The point is not minimal context.

The point is useful context.

LeanCTX now has 48k installs and 1.6k GitHub stars, which still feels weird because in my head it is partly a serious infrastructure project and partly a late-night argument I had with my own terminal.

I made it open source because I want people to be able to use it, inspect it, question it, improve it, and build on it.

I don’t want this layer to be locked inside one AI coding tool.

If agents are going to become part of how software is built, then context should become a shared infrastructure layer.

Something that can sit under different tools.
Something that can help agents talk to each other.
Something that can remember what matters.
Something that can reduce waste.
Something that can make AI workflows more efficient and more transparent.

Maybe that sounds too grand for a tool that started because I was annoyed at repeated file reads.

But honestly, a lot of useful infrastructure starts as annoyance.

A log was too noisy.
A build was too slow.
A deploy was too manual.
A model kept rereading the same file like it had short-term memory loss and a corporate credit card.

So yes, LeanCTX saves tokens.

But the bigger thing I care about is this:

Can we build AI systems that waste less?

Less compute.
Less repeated context.
Less noise.
Less blind trust.

More signal.
More reuse.
More transparency.
More infrastructure that everyone can benefit from.

That’s why it’s open source.

Not because I have everything figured out.

Because I don’t.

And that’s exactly why I’d rather build it in the open.

reddit.com
u/hushenApp — 1 day ago

A few months ago I noticed something stupid.

I was paying AI agents to forget.

They would read a file, do some work, lose the thread, read it again, run a command, dump half the terminal into the context, then ask for more information that was already there five minutes ago.

And I just kept thinking:

This cannot be the future.

Not because the models are bad. They are often amazing. Sometimes annoyingly amazing.

But the way we feed them context is messy.

We give them too much.
Then not enough.
Then the wrong thing.
Then the same thing again.
Then a giant log file as dessert.

At some point I stopped complaining and started building.

That became LeanCTX.

The first version was basically me trying to stop the bleeding. Cache repeated reads. Compress shell output. Give the model a smaller version of files when a smaller version is enough. Keep the useful parts of context alive across sessions.

Then the project started growing.

People used it.
People broke it.
People complained.
People sent weird edge cases.
People told me when my “optimization” was actually making the agent worse.

That last part was important.

Because it forced me to admit that token savings alone are a bad religion.

A smaller context is not automatically a better context.

If the model needs the full diff, give it the full diff.
If it only needs signatures, don’t send the whole file.
If a log has one useful error, don’t send 10,000 lines of emotional damage.

The point is not minimal context.

The point is useful context.

LeanCTX now has 48k installs and 1.6k GitHub stars, which still feels weird because in my head it is partly a serious infrastructure project and partly a late-night argument I had with my own terminal.

I made it open source because I want people to be able to use it, inspect it, question it, improve it, and build on it.

I don’t want this layer to be locked inside one AI coding tool.

If agents are going to become part of how software is built, then context should become a shared infrastructure layer.

Something that can sit under different tools.
Something that can help agents talk to each other.
Something that can remember what matters.
Something that can reduce waste.
Something that can make AI workflows more efficient and more transparent.

Maybe that sounds too grand for a tool that started because I was annoyed at repeated file reads.

But honestly, a lot of useful infrastructure starts as annoyance.

A log was too noisy.
A build was too slow.
A deploy was too manual.
A model kept rereading the same file like it had short-term memory loss and a corporate credit card.

So yes, LeanCTX saves tokens.

But the bigger thing I care about is this:

Can we build AI systems that waste less?

Less compute.
Less repeated context.
Less noise.
Less blind trust.

More signal.
More reuse.
More transparency.
More infrastructure that everyone can benefit from.

That’s why it’s open source.

Not because I have everything figured out.

Because I don’t.

And that’s exactly why I’d rather build it in the open.

reddit.com
u/hushenApp — 1 day ago

git log costs your agent 624 tokens. It needs 55. Here's a list of the worst offenders

I spent a week logging every shell command my coding agent ran and measuring the token cost of the raw output vs. what the agent actually used.

Most CLI tools were built for humans reading terminals, not for LLMs paying per token.

The worst offenders

Command Raw tokens What the agent needs After compression
git log 624 Last 3 commits + changed files 55 (-91%)
git diff 2,400+ Changed lines + file list ~320 (-87%)
npm test (200 passing) 3,100+ Pass/fail summary + failures ~180 (-94%)
cargo build (clean) 1,800+ Errors/warnings only ~90 (-95%)
docker build 5,000+ Final image + errors ~150 (-97%)
ls -la (big directory) 800+ File tree ~120 (-85%)
git status 340 Staged/unstaged/untracked ~60 (-82%)

This adds up fast. A typical 30-min session runs 40-60 shell commands. At an average of 1,500 tokens of raw output per command, that's 60-90K tokens just on CLI noise, verbose build logs, green checkmarks, download progress bars.

Why this matters more than you think

Every token of noisy shell output takes up space in the context window. That's space the agent can't use for reasoning about your actual code. I've seen agents lose track of a multi-step refactoring plan because npm install dumped 8K tokens of dependency resolution into the context mid-task.

What I did about it

I wrote pattern-based compressors for 95+ CLI commands grouped into 34 categories. Deterministic pattern matching, same input always produces the same compressed output in microseconds.

The rules are simple:

  • Strip progress bars, spinners, download indicators
  • Collapse repeated success lines (✓ test passed x200 → 200/200 passed)
  • Keep all errors and warnings verbatim
  • Preserve structure (file paths, line numbers, exit codes)

It runs as a transparent shell hook. Your agent runs git log like normal and gets the compressed version back. No workflow change.

What CLI commands burn the most tokens in your workflow?

reddit.com
u/hushenApp — 2 days ago

While working on LeanCTX, an open-source “Context OS,” I dove into the question of multi-agent use cases and agent-to-agent interaction. A current problem I see is that if you have multiple agents running on the same project, they all have their own individual context and view of the project.

I experimented a little and came to the conclusion that something like a shared “context bus” would make sense. This would allow you to connect multiple agents to the same context, so they would all have access to the same information.

A next thought was: “How is it possible to make context shareable?” Let’s assume you want to share the context of a project with someone else. Currently, it’s not possible to do this properly. Yes, you can share markdown files and project-related information, but you cannot copy and paste the real context into another project or send it via email to someone else.

I also tested this and worked on a function to package the entire context related to a project. This also enables versioning. What the function does is collect all the context information that LeanCTX has gathered over time, package it, and label it with relevant information.

Now you’re able to share the context with someone else, whether human or agent. That person can then import the context into LeanCTX and continue working from exactly the same point where you left off.

reddit.com
u/hushenApp — 6 days ago

I built an A2A Context Bus, which helps you to make sure every agent uses the same optimized context.

While working on LeanCTX, an open-source “Context OS,” I dove into the question of multi-agent use cases and agent-to-agent interaction. A current problem I see is that if you have multiple agents running on the same project, they all have their own individual context and view of the project.

I experimented a little and came to the conclusion that something like a shared “context bus” would make sense. This would allow you to connect multiple agents to the same context, so they would all have access to the same information.

A next thought was: “How is it possible to make context shareable?” Let’s assume you want to share the context of a project with someone else. Currently, it’s not possible to do this properly. Yes, you can share markdown files and project-related information, but you cannot copy and paste the real context into another project or send it via email to someone else.

I also tested this and worked on a function to package the entire context related to a project. This also enables versioning. What the function does is collect all the context information that LeanCTX has gathered over time, package it, and label it with relevant information.

Now you’re able to share the context with someone else, whether human or agent. That person can then import the context into LeanCTX and continue working from exactly the same point where you left off.

reddit.com
u/hushenApp — 6 days ago

While working on LeanCTX, an open-source “Context OS,” I dove into the question of multi-agent use cases and agent-to-agent interaction. A current problem I see is that if you have multiple agents running on the same project, they all have their own individual context and view of the project.

I experimented a little and came to the conclusion that something like a shared “context bus” would make sense. This would allow you to connect multiple agents to the same context, so they would all have access to the same information.

A next thought was: “How is it possible to make context shareable?” Let’s assume you want to share the context of a project with someone else. Currently, it’s not possible to do this properly. Yes, you can share markdown files and project-related information, but you cannot copy and paste the real context into another project or send it via email to someone else.

I also tested this and worked on a function to package the entire context related to a project. This also enables versioning. What the function does is collect all the context information that LeanCTX has gathered over time, package it, and label it with relevant information.

Now you’re able to share the context with someone else, whether human or agent. That person can then import the context into LeanCTX and continue working from exactly the same point where you left off.

reddit.com
u/hushenApp — 6 days ago

I keep seeing people ask for bigger and bigger context windows.

And yeah, I get it. It sounds nice. Just throw the whole repo into the model and let it figure things out.

But I’m starting to think that’s not really how good engineering works.

A senior engineer doesn’t understand a codebase by reading every single file. They know what to ignore. They follow signals. They remember the weird parts. They know where the bodies are buried.

AI coding agents don’t really have that yet.

Most of the time we just give them a huge pile of files, logs, prompts and tool outputs, then act surprised when they lose the plot.

I think the next big layer in AI coding is context infrastructure.

Not just more tokens. Better context.

What should the model see? What should be compressed? What should be remembered? What should never be sent in the first place?

I’ve been exploring this while building LeanCTX, but honestly the bigger question interests me more than the tool itself:

Are we actually solving AI coding with bigger windows, or are we just making the pile bigger?

reddit.com
u/hushenApp — 7 days ago

I keep seeing people ask for bigger and bigger context windows.

And yeah, I get it. It sounds nice. Just throw the whole repo into the model and let it figure things out.

But I’m starting to think that’s not really how good engineering works.

A senior engineer doesn’t understand a codebase by reading every single file. They know what to ignore. They follow signals. They remember the weird parts. They know where the bodies are buried.

AI coding agents don’t really have that yet.

Most of the time we just give them a huge pile of files, logs, prompts and tool outputs, then act surprised when they lose the plot.

I think the next big layer in AI coding is context infrastructure.

Not just more tokens. Better context.

What should the model see? What should be compressed? What should be remembered? What should never be sent in the first place?

I’ve been exploring this while building LeanCTX, but honestly the bigger question interests me more than the tool itself:

Are we actually solving AI coding with bigger windows, or are we just making the pile bigger?

reddit.com
u/hushenApp — 7 days ago

One of the biggest challenges when working with AI agents is the lack of a shared context base.

Each agent operates with its own isolated context.
One agent knows something, another one doesn’t.
Important decisions, changes, and learnings easily get lost between sessions, tools, and workflows.

To solve this, I created a Context Bus layer for LeanCTX.

It allows multiple agents and systems to connect to the same shared context base, so they can work with a common understanding instead of operating in separate silos.

In simple terms:

Instead of every AI agent having its own little memory bubble, they can now access and contribute to a shared context layer.

That makes multi-agent workflows more consistent, more transparent, and much easier to coordinate.

https://preview.redd.it/n5cgri705pzg1.png?width=1596&format=png&auto=webp&s=aeff06eb85fbcf81ed896341003f7fb52ebc4650

reddit.com
u/hushenApp — 7 days ago

One of the biggest challenges when working with AI agents is the lack of a shared context base.

Each agent operates with its own isolated context.
One agent knows something, another one doesn’t.
Important decisions, changes, and learnings easily get lost between sessions, tools, and workflows.

To solve this, I created a Context Bus layer for LeanCTX.

It allows multiple agents and systems to connect to the same shared context base, so they can work with a common understanding instead of operating in separate silos.

In simple terms:

Instead of every AI agent having its own little memory bubble, they can now access and contribute to a shared context layer.

That makes multi-agent workflows more consistent, more transparent, and much easier to coordinate.

reddit.com
u/hushenApp — 7 days ago
▲ 2 r/Agent_AI+1 crossposts

One of the biggest challenges when working with AI agents is the lack of a shared context base.

Each agent operates with its own isolated context.
One agent knows something, another one doesn’t.
Important decisions, changes, and learnings easily get lost between sessions, tools, and workflows.

To solve this, I created a Context Bus layer for LeanCTX.

It allows multiple agents and systems to connect to the same shared context base, so they can work with a common understanding instead of operating in separate silos.

In simple terms:

Instead of every AI agent having its own little memory bubble, they can now access and contribute to a shared context layer.

That makes multi-agent workflows more consistent, more transparent, and much easier to coordinate.

https://preview.redd.it/vx148njzsozg1.png?width=1672&format=png&auto=webp&s=b5a1bb334f68afad00dbe65b7ca2920fa45e4723

reddit.com
u/hushenApp — 7 days ago

I've been using AI coding agents (Claude Code, Cursor, Copilot) pretty heavily for the last year. One thing that kept bugging me: these models burn through context windows reading the same files repeatedly, getting full verbose build output when they only need the errors, and starting from zero every new chat.

So I built LeanCTX. It's a local MCP server written in Rust that sits between your IDE and the model. The idea is simple: if the model already read a file this session and nothing changed, don't send it again. If git diff outputs 500 lines but only 20 matter, compress it. If the model needs to understand your codebase structure, give it a code graph instead of making it read every file.

Real numbers from daily use: file re-reads go from 2,000 tokens to about 33. Shell command output gets compressed by 80-95% depending on the command. Overall session savings are consistently 60-80%.

It works with basically every AI coding tool out there. Cursor, Claude Code, GitHub Copilot, Windsurf, Codex CLI, Gemini, JetBrains, Cline, about 24 editors total. One install command, one setup command, and it auto-configures for whatever you're using.

The whole thing is open source, Apache licensed, single binary with no dependencies. Currently at about 35,000 installs. If you're paying for API tokens or just running into context limit issues, it might be worth a look.

Not going to pretend it's perfect. I've had users find bugs at 10pm and I've had to rewrite entire subsystems based on feedback. But that's how you build something people actually use.

reddit.com
u/hushenApp — 11 days ago

The problem: I was using Claude Code and Cursor daily, and noticed the models kept reading the same files over and over, getting full verbose git diffs when they only needed the summary, and forgetting everything between sessions. I tracked it for a week and about half my tokens were going to redundant context.

LeanCTX is a local MCP server that fixes this. It sits between your editor and the model. When the model reads a file it already saw this session, LeanCTX returns a tiny cache fingerprint instead of the full content. When it runs a shell command, LeanCTX compresses the output using patterns for 90+ tools like git, docker, npm, cargo, kubectl. When the model needs to understand the codebase, there's a code graph built with tree-sitter so it can ask "what imports this" instead of reading every file.

The setup is one command: install with curl or cargo, run lean-ctx setup, and it configures itself for whatever editor you use. Works with Cursor, Claude Code, Copilot, Windsurf, Codex, Gemini CLI, JetBrains, and about 20 more.

There's also cross-session memory so the model remembers what it learned yesterday, PR context packs that auto-generate relevant context for code reviews, and a live dashboard showing exactly how many tokens you're saving in real time.

Single Rust binary, everything local, nothing cloud. Been using it daily for months and the token savings are consistent 60-80%.

reddit.com
u/hushenApp — 11 days ago

Every time an agent like Claude, Copilot or Codex works on your code, it re-reads files it already saw, gets full verbose output from shell commands it could have gotten compressed, and loses everything it learned when the session ends. On a typical coding session, I measured 40-60% of context tokens going to redundant content.

I built LeanCTX to sit as a local MCP server between the IDE and the model. The core mechanics: file reads get session-cached so a re-read costs 33 tokens instead of 2,000. Shell output from git, docker, cargo, npm, kubectl and about 90 other tools gets pattern-compressed in real time. A tree-sitter code graph for 18 languages lets the model query "what depends on this function" instead of reading every file to figure it out.

The part I find most interesting from an infrastructure perspective is the adaptive mode selection. LeanCTX has 10 different read modes from full content down to signatures-only or entropy-filtered output. A bandit algorithm learns which mode works best for which file type and size over time, so the system gets more efficient the longer you use it.

There's also a knowledge graph that persists across sessions with temporal facts and contradiction detection, so the model doesn't re-discover the same things every chat. And SLO monitoring that tracks compression efficiency and warns when savings drop below your threshold.

Curious if anyone else is working on the context efficiency layer. Feels like most tooling focuses on model capability but ignores how much context gets wasted on the transport side.

reddit.com
u/hushenApp — 11 days ago

lean-ctx is a local context runtime written in Rust that caches file reads, compresses shell output and indexes your codebase so your model stops wasting tokens on redundant context.

I recently fixed Pi-specific compatibility issues. Pi's MCP bridge sends array parameters as JSON-encoded strings instead of native arrays, which broke multi-file reads. That's fixed now, lean-ctx detects the format automatically. There's also ctx_call, a meta-tool with a stable schema that works around Pi's static tool registry. You call ctx_call with the tool name and arguments, it dispatches internally, so you get access to all 49 tools even if Pi only loaded the initial set at startup.

The core: when your model re-reads a file, lean-ctx returns a cache fingerprint (~13 tokens) instead of the full content (often 2,000+). Shell commands get compressed with 90+ patterns covering git, npm, docker, cargo, kubectl output. A tree-sitter code graph for 18 languages lets the model query imports, dependents and blast radius without reading every file. ctx_pack builds compact PR context packs with changed files, related tests and impact summary. ctx_knowledge keeps a persistent knowledge graph across sessions with temporal facts and contradiction detection.

There's a live TUI dashboard showing token savings, cache hits, SLO monitoring and every tool call in real time. Everything local, nothing cloud, single Rust binary.

Terminal output

reddit.com
u/hushenApp — 12 days ago

lean-ctx is a local MCP server written in Rust that I built specifically for AI coding workflows. It sits between your editor and the model and makes sure the model never wastes tokens on stuff it already knows.

When your model reads a file it already saw this session, lean-ctx returns a 33-token fingerprint instead of the full file again. When it runs a shell command, lean-ctx compresses the output using 90+ patterns for git, cargo, npm, docker, kubectl and friends. When it needs to understand your codebase, lean-ctx has a full code graph built with tree-sitter for 18 languages, so the model can ask "what imports this function" instead of reading every file.

The server ships as a single binary with 49 MCP tools. ctx_read gives you 10 different read modes from full content down to signatures-only or entropy-filtered. ctx_pack builds PR context packs with changed files, related tests and impact analysis. ctx_search does hybrid BM25 search across your codebase. ctx_knowledge maintains a persistent knowledge graph with temporal facts across sessions so your model remembers what it learned yesterday.

There's a live TUI dashboard that shows every tool call, token savings, cache hits and SLO violations in real time. The server monitors its own compression efficiency and warns you when savings drop below your configured threshold.

Processing img a7ii0jx4ypyg1...

reddit.com
u/hushenApp — 12 days ago

Watched my token usage for a week. The pattern was always the same: model reads a file, edits it, reads it again to verify, then reads it AGAIN next turn because it forgot.

Built lean-ctx, an MCP server that caches file reads per session. Second read of the same unchanged file costs 33 tokens instead of 2,000. It also compresses shell output, git diffs, docker logs, cargo builds with 90+ patterns.

Setup is one command:

curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx setup

Works with Cursor, Claude Code, Copilot, Windsurf, Codex, JetBrains. Not a wrapper, not a proxy. Local MCP server, nothing leaves your machine.

Terminal

reddit.com
u/hushenApp — 12 days ago

I've been building LeanCTX, a context runtime for AI coding agents, for the past few months. Here's what actually mattered:

  1. Your best users are the ones who complain. A user told me at 10pm that my uninstaller just nuked his config files. My instinct was to get defensive. Instead I traced it — and found it was worse than reported. That one message led to rewriting the entire uninstall logic. Every angry bug report is a gift.
  2. Your favorite metric can lie to you. I built a cache that reduced file reads from 2,000 tokens to 13. Great numbers. Then a user told me: "Models waste more tokens working around stale cache than the cache saves." He was right. The fix wasn't removing caching — it was making it smarter. Lesson: your dashboard can look great while the experience is terrible.
  3. Saying no is the hardest part. A new API feature would have let me compress all tool output automatically. Massive savings on paper. I planned it, designed it, then shelved the whole thing. Because when compression eats an error message, there's no undo. Protecting quality beats shipping features.
  4. Community is a relationship, not a channel. When someone reports a bug, my first response matters more than the fix. "Will check" buys time but shows I'm listening. Following up shows respect. Shipping the fix shows they matter. My best testers are people who once filed angry reports.
  5. Ship the boring stuff first. Nobody cares about my adaptive compression algorithm if the installer breaks their config files.
  6. Focus means killing good ideas. My backlog has 50+ ideas. Each one is good. But spreading across all of them means none become great.

If nobody is complaining yet, you probably don't have enough users. Go find them. And when they complain — listen.

reddit.com
u/hushenApp — 12 days ago

I've been building LeanCTX — a local-first context runtime for AI coding agents, written in Rust — for the past few months. 49 MCP tools, 18-language tree-sitter AST, 90+ shell compression patterns, one single binary. Here's what actually mattered:

  1. Your best users are the ones who complain. A user told me at 10pm that my uninstaller just nuked his shell config. My instinct was to get defensive. Instead I traced it — and found it was worse than reported. That one message led to rewriting the entire uninstall logic from scratch. Every angry bug report is a gift.
  2. Your favorite metric can lie to you. I built a cache that reduced file reads from 2,000 tokens to 13. Great numbers. Then a user told me: "Models waste more tokens working around stale cache than the cache saves." He was right. The fix wasn't removing caching — it was making invalidation smarter. Your dashboard can look great while the experience is terrible.
  3. Saying no is the hardest part. A new feature would have let me compress all tool output automatically. Massive savings on paper. I designed it, prototyped it, then killed it. Because when compression eats an error message, there's no undo. Protecting quality beats shipping features.
  4. Community is a relationship, not a channel. When someone reports a bug, my first response matters more than the fix. "Will check" buys time but shows I'm listening. Following up shows respect. Shipping the fix shows they matter. My best testers are people who once filed angry reports.
  5. Ship the boring stuff first. Nobody cares about your adaptive entropy-based compression algorithm if the installer breaks their dotfiles. Get the fundamentals right — install, uninstall, doctor, setup — before you get clever.
  6. Focus means killing good ideas. My backlog has 50+ ideas. Each one is good. But spreading across all of them means none become great. Rust helps here — the compiler forces you to finish what you start.

If nobody is complaining yet, you probably don't have enough users. Go find them. And when they complain — listen.

reddit.com
u/hushenApp — 12 days ago