u/No_Iron1885

Your startups will FAIL bc ur stuck on “agents” (I will not promote)

(most) Current startups will be obsolete in two years. Companies like Anthropic and OpenAI will continue to rapidly ship products and features, meaning companies building individual tools, (think: RAG, memory, browser automation) will fail.

imo, the money is going to be in building "Agent Adjacent"

essentially, providing the things the big guys can't. Think company specific services like:

-Giving their agent an exact understanding of how their business operates.
-Make sure their agent can use whatever SaaS they need accurately and reliably.

There is no future in providing agents to companies. My approach is to assume they will alr have an agent and simply need 1 service that does everything else.

if you disagree please lmk bc I genuinely want to hear your opinion

reddit.com
u/No_Iron1885 — 3 days ago
▲ 2 r/mcp

"Update #engbackend's topic." One API call. The easiest category in the benchmark. The agent knew exactly what to do. But Slack's official MCP server doesn't expose conversations.setTopic. So it apologized and told me try doing it manually.

I assumed this was a one-off gap. It was not.

Disclosure: I built one of the two servers being tested. I run Hintas, and the Hintas MCP is the one being compared against Slack's official MCP. Every prompt, every transcript, every grading criteria is in the repo. Draw your own conclusions from the raw data.

The test. Two identical Slack workspaces, seeded with the same users, channels, messages, threads, reactions, pins, DMs, files, and permissions. One runs Slack's official MCP. The other runs mine. 48 prompts go against both — reads, writes, searches, channel management, multi-step workflows, edge cases. Difficulty from L1 (one API call) to L4 (five or more coordinated calls).

Each Claude Code session only has access to the MCP under test. No shell, no web, no file I/O. Succeed or fail on the MCP alone. A separate Claude session grades every run. Workspace resets between prompts since most tasks are destructive.

[48 prompts · same model · same workspace state]

Slack → 23% success / 3 tool failures / 4,132 tokens
Hintas → 77% success / 0 tool failures / 11,684 tokens

23% vs 77%. Same model on both sides.

Looking at token usage, the official server is lighter because it's failing faster. If you don't have the correct tool, you bail out faster.

The per-prompt breakdown is where it gets bad.

Most failures on the official side aren't model errors. They're capability gaps. The agent correctly identifies what needs to happen, then discovers the MCP doesn't expose the method. Ex:

"React to the latest message in #marketing." Agent found the message, got the timestamp, reported it had no tool for reactions. FAIL.

"Unarchive #old-playtest-2025, post a welcome-back, invite two users." No conversations.unarchive, no conversations.invite. FAIL.

These aren't hard tasks. They're bread-and-butter Slack operations that any workspace admin does weekly. The agent gets the approach right every time — it just can't execute.

The official MCP covers reading channels, reading messages, searching, sending messages, and looking up profiles. That's maybe 40% of what you'd actually want an agent to do in Slack.

The Hintas failures tell a different story. One prompt failed because the agent used the wrong email address. Another hit a missing OAuth scope.

The agent had every tool. It just used them wrong. That's a debugging problem. On the official MCP side, the agent literally cannot proceed because no tool exists. Those are different problems with different fixes.

No model upgrade fixes a missing tool. And "prompt engineering." won't work either. The Slack Web API has hundreds of methods. The official MCP exposes a fraction.

"Spin up an incident war room — create the channel, set the topic, invite the on-call team, post a kickoff message." Four operations. The agent planned all four correctly. The official MCP couldn't do any of them.

The 54-point gap is just coverage.

reddit.com
u/No_Iron1885 — 9 days ago
▲ 17 r/MCPservers+2 crossposts

Ok so we all know most well-known SaaS companies have MCP by now. It's either an unofficial one or an official one. I thought that if a company has an official MCP it would be made with best practices in mind. I was completely wrong.

The Slack MCP doesn't expose nearly enough endpoints, and what they do expose has to be loaded as context each time to the agent. There is a new method called code-mode which is essentially exposing a search tool to the agents where it can search for the exact tools required to execute a multi-step task. And then an execute tool where it can write custom TypeScript commands, chaining APIs, in a secured sandbox.

I did this in a few hours, benchmarked it against Slack, and IT FUCKING OUTPERFORMS IT. Like unless I'm clearly missing something, why don't all these massive companies take the time to make these small improvements to their MCP that in turn will boost efficiency and accuracy by 3x+?

The benchmark link is in the comments

reddit.com
u/No_Iron1885 — 10 days ago