u/tinys-automation26

Something clicked for me recently and I wanted to share because I think a lot of people building with AI agents are hitting the same wall without realising what's causing it.

Your agent has a finite context window. That's its working memory, everything it sees, thinks about, and receives from tools has to fit in there. When it fills up, the agent starts forgetting things, gives worse answers, and eventually just fails the task.

Here's the thing nobody talks about with MCP: every single tool call dumps the full schema, the request, and the entire JSON response back into that working memory. One call can cost 500-2,000 tokens in overhead. The data you actually wanted? Maybe 200 tokens. Run 20-30 of those in a task and you've burned through half your context on protocol plumbing. No wonder the agent starts losing the plot halfway through.

I work at TinyFish (we build web infra for AI agents) and we had a front row seat to this. We shipped an MCP server, it worked fine on small tasks, then watched it completely fall over on anything involving more than a handful of web operations. So we shipped a CLI on the same backend. Same API, same everything. Just a different access pattern.

The difference was kind of embarrassing. 45K tokens overhead on MCP vs 3K on CLI for the same task. Completion rates went from roughly 35% to 90%+.

And it makes total sense when you think about it. Claude Code, Cursor, these tools already live in a terminal. They have bash. They have a filesystem. When your agent runs a CLI command, output goes to a file. The agent reads the file when it needs to. Pipes work. Loops work. Parallel execution works. Intermediate results never touch the context window.

It's basically just the unix philosophy - stdin, stdout, everything is a file - applied to a problem nobody expected it to solve. Your agent's context window is bounded working memory, and unix was built to keep intermediate data out of bounded memory. The fit is almost too clean.

Anyway, not saying MCP is useless. It's great for tool discovery and there are real cases for it in multi-agent setups. But if you're building tools that agents will use for actual work, especially anything involving multiple operations, seriously consider shipping a CLI alongside it.

Unix philosophy is the correct execution model for AI agents