r/mcp

Web scraping for LLMs was driving us insane, so we built our own Search API with native MCP support
▲ 8 r/mcp+1 crossposts

Web scraping for LLMs was driving us insane, so we built our own Search API with native MCP support

Hey r/SideProject 👋

My team and I build AI agents, and web search has been our biggest pain point for the last six months.

The standard developer workflow right now is kind of awful: You hit a search API, get back links, write a scraper, deal with captchas and blocking, then end up feeding your LLM a giant pile of HTML full of cookie banners, menus, and random junk. The model gets confused and your token usage explodes.

So we decided to build something specifically for RAG pipelines and AI agents: Search Router (https://search-router.com)

A few things we focused on:

  • Speed: P99 latency under 800ms. Agents respond fast and users don’t sit around waiting.
  • MCP-ready: native support for Model Context Protocol. You can plug our config directly into Claude Desktop and let it run searches through the tool without burning Anthropic limits.
  • Clean JSON output: structured responses that are actually pleasant to work with programmatically.

What we shipped recently:

Added the Retrieved Context for LLM endpoint - instead of giving you the whole site or short snippets, our API returns a structured JSON with extracted relevant context. This heavily reduces the need for manual HTML cleanup and saves LLM tokens.

We’d genuinely love feedback.

The project is still very early, so we wanted people to be able to actually test it on real projects without worrying about limits.

We want your feedback:

The project just launched. So you can properly break it on your pet projects, we made an unlimited free tier during the launch period.

You just sign up (no card required) and get 2000 requests. Once the limit is out, you can just go to the dashboard and hit the "refill" button to get more free test credits.

Would love bug reports, edge cases, feature requests, or honestly just hearing where the product sucks right now!

u/ummitluyum — 1 hour ago
▲ 8 r/mcp+1 crossposts

Rook: Notes app for code. Claude, Cursor, and Gemini can save directly to it

Hey everyone 👋

I built Rook because I couldn't find a place for my code notes. I've been used to Apple Notes, it’s fast and minimal, however it doesn't support code blocks.

Why not use something that exists? I tried:

VS Code. It works for markdown, but I always needed a preview extension to see md files rendered, which felt clunky. And I wanted notes to live outside of any specific codebase, not tied to a repo. Something small, open on the side of my desktop.

Obsidian. Didn't feel right at all. Not designed for the kind of simplicity I was looking for.

Bear, Craft, Notion. Too clunky. Not as minimal or fast as I wanted.

Dedicated snippet apps. Opposite problem. Great for code, no place for the notes around it.

On top of that, coding with AI has multiplied my work and I found myself lacking a dedicated place to capture the ongoing logic.

So I built Rook. It’s a free, local and native Mac app made for code notes:

  • markdown support
  • syntax highlighted code blocks (17+ languages)
  • 5 simple themes
  • everything stays on your Mac
  • optional MCP support so AI tools can write directly into it

I now just say “save this to Rook” to Claude instead of copy-pasting around.

Rook is live on Product Hunt today 🚀
Would genuinely appreciate any support or feedback: https://www.producthunt.com/products/rook-4

I’m also doing a lifetime discount for the first 100 people who sign up today: https://userook.app

u/mimoo01 — 2 hours ago
▲ 10 r/mcp

Anthropic's new mcp tunnel architecture: the agent never holds the credential

Reading through the 19th May Claude managed agents update. The mcp tunnel update peaked my interest.

Apparently, the setup will be that a small gateway runs inside your network. It opens one outbound mTLS connection to anthropic. The agent reaches private mcp servers through that tunnel. No inbound firewall rules. No public endpoint. The mcp server inside your perimeter holds the credentials. The agent never sees them.

A normal managed agents deployment carries the tokens in the runtime. A long-lived oauth bearer for salesforce. A pat for github. A service account key for the warehouse. All sitting in the agent's context, where prompt injection, tool poisoning, or a supply chain hit can lift them.

With tunnels the credentials move to the perimeter. The agent makes a tool call, the call goes through the tunnel encrypted with a cert the customer issued, and a local mcp server with proper scoping turns it into an authenticated request. A prompt-injected agent has no token to steal. The blast radius now stops at whatever each individual mcp server allows.

Worth comparing to what OpenAI did in April. Their agents sdk update lets you move both the harness and the compute to your side. You can run the whole stack yourself. Anthropic chose not to. The agent loop stays on their infra. Only tool execution and mcp connectivity move out.

You don't own the loop. You own the boundary. Whether that trade lands for you depends on how much you trust anthropic to run the loop and how much vendor lock-in you can stomach.

A few caveats before anyone wires this up in prod:

  • Research preview, not ga. Suites and key rotation cadence are not in the public docs yet.
  • The orchestration plane runs on anthropic. If they have a bad day your agents have a bad day, and there is no failover path because the loop is not something you can stand up yourself.
  • Credentials still exist. they moved from the agent context to an mcp server you operate. That server still needs proper scoping, audit logging, and least-privilege downstream tokens. no architecture trick fixes that part.

For anyone running mcp servers in production: Does the split land in the right place for you, or would you rather own the whole loop the way openai's sdk lets you?

I put together a longer breakdown, that sheds more light on the new announcement.

u/Ok-Constant6488 — 2 hours ago
▲ 7 r/mcp+2 crossposts

: I built an AI agent runtime in Go that compiles and tests generated code before delivering it , 35 files, 156 tests, zero dependencies

I've been building ARK (AI Runtime Kernel) for the past 10 months. It's an open-source runtime that sits between your AI agent and the LLM, governing every decision the model makes.

The core idea: models shouldn't control the system. The runtime should.

What it does:

When you ask ARK to write Go code, it doesn't just pass the prompt to GPT and hand you back whatever comes out. The runtime classifies the task, optimizes the prompt, generates the code, then runs a 6-phase verification pipeline before you see anything:

├─ Step 1: ✓ Reasoning verified (confidence: 70%)
│  🧪 Verification: tested (score: 100%)
│  ✅ Compiled        ← go build
│  ✅ Executed         ← go run
│  ✅ Tests passed     ← auto-generated tests
│  ✅ Lint clean       ← go vet

If the code fails compilation, ARK feeds the compiler error back to the model, forces a stronger model, and retries. If it still fails after 2 attempts, it refuses to deliver broken code. It never claims success for code that doesn't compile.

The Go-specific stuff that might interest this community:

The entire runtime is pure Go, zero external dependencies (just stdlib). 35 files, ~16,000 lines, 156 tests, race detector clean. Some things I'm proud of:

  • Weighted tool ranking with 6 signals (relevance, success rate, Bayesian confidence, cost, latency, memory bonus) — all computed in microseconds
  • Context engine that reduces tool schema tokens from 60K to ~93 (99.9% reduction) by only loading relevant tools
  • Per-step model routing: cheap model (gpt-4o-mini) handles tool calls, strong model (gpt-4o) handles reasoning. Cuts costs 80-90%
  • Cognitive Governor that verifies every output with calibrated confidence scores
  • Auto-fix for common model errors in generated Go code (orphan braces, missing error handling) — detects both tab and space indentation
  • Event emitter that writes JSONL for a separate Python memory layer to ingest

Cost: A typical task costs $0.002-$0.005. Not $0.05.

Example output:

go run ./cmd/ark run agent.yaml --task "write a function in Go that reads CSV"

✅ Task completed successfully
Steps: 1 | Tokens: 637 | Time: 5.6s | Cost: $0.002

The generated code compiles, runs, and passes auto-generated tests before you see it.

GitHub: github.com/atripati/ark

I'm a CS undergrad at DePaul in Chicago building this solo. Applied to YC S26 with it. Happy to answer questions about the architecture, the verification pipeline, or why I chose Go for this.

u/Aromatic-Ad-6711 — 8 hours ago
▲ 22 r/mcp

Understanding How MCP Works Internally with LLMs and MCP Clients

Hello Experts,

I have recently started learning the MCP (Model Context Protocol) concept. I created a simple MCP server and connected it with Claude Desktop as the MCP client.

I want to understand how the complete flow works internally, especially how the LLM understands when it should use an MCP server.

For example:

  • If a user writes a prompt in natural language in Claude Desktop chat, what are the exact backend steps that happen?
  • How does the LLM understand the context of the prompt? Does the LLM understand it by itself, or does it use the tool docstrings/descriptions provided by the MCP server? What actually happens internally?
  • How does it decide that a specific MCP server/tool should be used (for example, an internet/search MCP server)?
  • How does the MCP client expose the available tools, prompts, and resources to the LLM?
  • How is the context maintained during the conversation?

I want to understand the complete end-to-end architecture and internal workflow in detail.

Another thing I noticed is that in most MCP examples, only tools are commonly used. I do not clearly understand:

  • How resources are managed
  • How prompts are managed
  • How the MCP client/LLM becomes aware of these resources and prompts
  • When resources/prompts are preferred over tools

If anyone can explain the detailed architecture or share learning resources/examples, it would really help me.

Thanks in advance!

reddit.com
u/19khushboo — 9 hours ago
▲ 142 r/mcp+14 crossposts

Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph)

Hey everyone,

I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Code, Cursor, Windsurf) using a unified local database.

I wanted something lightweight that did not require pulling heavy Docker containers or subscribing to third-party memory APIs. I settled on a Node.js + SQLite architecture running sqlite-vec (for 768-dim float32 embeddings) alongside SQLite FTS5 for hybrid search, powered completely by local Ollama instances.

We just launched a live website that outlines the details and demonstrates the features in action:

Technical Stack & Features:

  • Hybrid Search Retrieval: SQLite-vec (using nomic-embed-text locally) + FTS5 keyword prefix matching (porter stemmer).
  • Surgical Sentence-level Trimming: Chunks are sliced into sentences. When a prompt is intercepted, only the exact matching sentences are pulled out of the vector store instead of the whole paragraph. It cuts LLM prompt bloat by ~90-95% in my benchmarks.
  • Knowledge Graph Extraction: An offline task queue uses a local LLM (llama3.1:8b via Ollama) to extract entity triples (subject-relation-object). These are stored in a SQLite facts table (or Neo4j if you run the full Docker compose profile) and fused with the vector retrieval score.
  • HyDE (Hypothetical Document Embeddings): Queries are pre-processed to generate a hypothetical answer, which is embedded together with the original query to bridge semantic gaps.
  • Concurrency: Running SQLite in WAL (Write-Ahead Logging) mode allows the browser extension dashboard and active MCP sessions to read/write concurrently without locking.
  • PII Redaction: Aggressive scrubbing of JWTs, API keys, emails, and IPs in the extension before data is saved.

The extension works on Claude.ai, ChatGPT, DeepSeek, Gemini, Grok, and Mistral. The MCP server runs out of the same backend database for your terminal agent or Cursor.

You can set it up with a single command: npx glia-ai-setup

Glia is completely open-source (MIT). If you like the local-first approach or want to contribute to the SQLite vector pipeline, PRs are very welcome, and a star on GitHub helps the project get discovered!

I would appreciate any feedback on the SQLite hybrid search scaling, the scoring fusion algorithm (RAG pipeline details are in RAG_PIPELINE.md), or local graph extraction performance!

u/Better-Platypus-3420 — 16 hours ago
▲ 8 r/mcp

With Chrome Prompt API makes remote MCPs more important

Today, Chrome team announced general availability of Chrome Prompt API. Although it is meant to talk with the current pages, you can connect CORS enabled remote MCPs and also WebMCP. Remote MCPs are becoming more critical for the products that lets users to interact with prompt on their interfaces while visiting pages. It is big unlock for MCP adoption. The model is Gemini Nano, at this point, it is good enough to do basic things.

More details: https://developer.chrome.com/docs/ai/prompt-api

u/hasmcp — 8 hours ago
▲ 1 r/mcp

What’s the worst disaster your autonomous AI agent has caused so far?

Hey everyone,

We’ve all seen the polished staging demos of autonomous agents and MCP servers interacting with production databases, cloud infrastructures, and Git repositories.

But out in the wild, things get unpredictable pretty fast. A hallucinated argument, a subtle prompt injection in a read file, or an unhandled edge-case response can quickly turn an assistant into a rogue process.

To kick things off, the absolute worst thing that happened on my watch was a rogue agent nuking my complete git repository out of existence because it misread a merge conflict. My heart skipped a beat, and I spent the next few hours praying to the reflog gods to get my code back.

I'm really curious to gather some raw, real-world examples from the community: What is the worst, funniest, or most expensive disaster an autonomous agent has caused on your watch? Did an agent accidentally drop a production table? Spam a client with thousands of automated emails? Trigger a cloud auto-scaling loop that burned through your budget?

How did it happen, and how did you clean up the mess? Share your best "the agent went rogue" stories below!

reddit.com
u/overlord_sid85 — 15 hours ago
▲ 28 r/mcp+1 crossposts

MCP Apps Framework : We just released Skybridge v1 🎉

Hi Reddit,

Over the last few weeks, my team and I at Alpic have been working on a complete revamp of the Skybridge framework to make it as smooth and easy to get started with as possible.

As you may know, Skybridge is an open-source framework we built to help developers get started with MCP apps. It’s a thin layer on top of the official TypeScript SDK that provides the wiring and tooling needed specifically for apps.

With this v1 release, we’ve introduced:

  • New DevTools with a UI designed specifically for MCP app development
  • An integrated tunnel that can be started with a single click directly from the DevTools
  • Shareable chat URLs to test or showcase your MCP app with a real LLM
  • An audit feature to ensure your app and metadata comply with store requirements before submission (which can save a lot of time, since app reviews can be lengthy!)

We also stabilized the API with a simplified design and are proud to offer strong tool-to-component type safety.

It’s now also possible to deploy Skybridge outside of Alpic. While Alpic was designed specifically for MCP app hosting, we understand that some users may prefer hosting on different stacks for their own reasons.

Hope you enjoy it!

github.com/alpic-ai/skybridge

reddit.com
u/harijoe_ — 22 hours ago
▲ 6 r/mcp

how many MCPs do you use daily?

I am having this conversation with some colleagues and we have not realized that we use everyday the same 2/3 mcp servers, and those are always the same added when a new project is started. For example, supabase and linear mcp.

So I am curious of how many MCPs do you guys use everyday and if in every new project which ones are the ones you add?

reddit.com
u/0xKoller — 1 day ago
▲ 88 r/mcp

used to think MCP was just tool calling. now i get it.

Like OpenAI already had tools, Anthropic had tools, Gemini had tools. Didn’t really get why another spec was needed.

Then I hit this at work while wiring the same internal tools across different models and apps. Slack, GitHub, SQL, internal search, Notion etc all had different wrappers and formats depending on where they were being used. At some point I realized half the work was just making everything look consistent.

That’s when MCP finally clicked for me. The value isn’t really “tool calling.” It’s convenience and standardization.

Now I’m seeing the same thing happen one layer higher in infra too. Bifrost, LiteLLM, Kong AI Gateway and similar stuff all seem to be solving the same underlying problem: too many providers, too many SDKs, too many integrations, too many moving parts.

None of this stuff is technically impossible to build in-house. But after a point you realize unified interfaces are just easier to live with.

reddit.com
▲ 4 r/mcp+1 crossposts

Which knowledge bases are you connecting as an MCP?

I'm looking for the best MCP knowledge base connectors as I use multiple AI tools and need something that I can plug into whichever AI tool I want.

I have heard about obsidian but what other options are out there?

Show me your best tools/setups?

reddit.com
▲ 2 r/mcp+2 crossposts

I "accidently" turned my e2e tests into MCP tools

Hey guys!

I've been pimping playwright for a while - chasing my obsession of building a tool that lets me create e2e tests quickly while enforcing best practices like proper use of fixtures, semantic POM etc.
I'm pretty far already - UI-based e2e test recording works, giving me proper test steps, POM, UI and API tests - but my current project at work gave me an idea that sent me on a side quest.

tldr;
Check the video:
- I record our dashboard creation flow using my tool in Cursor
- Cursor writes POM, fixtures, e2e test, WebMCP tool definition, wiring
- I ask the AI-Assistant to create a new Dashboard for me
- The assistant creates the dashboard using the newly recorded flow

I've been working on creating our in-app AI assistant during my day job. One of our main goals is helping our users with onboarding: explaining to them how certain features work and where they can find stuff on the UI.
I wanted to take it a step further, since imo showing is better than telling. Certain UI Assistant libraries (we're using CoplikotKit) allow calling FE tools and MCPs. My idea was to expose our main user flow as FE tools to our assistant, so they can do things on the user's behalf - or show them when prompted.

I modified my tool to not only generate POM and e2e tests, but also FE tool and MCP definitions from the same, single source of truth.

So now from one recording, I'm able to generate:
- A single flow.spec.ts file that can execute the same flow using 3 modes:
- ui-based e2e test
- API e2e test
- FE tool test (via WebMCP bridge)
- WebMCP tools for any AI assistant use (claude, codex etc)
- Wiring WebMCP tools into our in-app CopilotKit assistant

It's still super early, but I've always been fascinated by the idea of having a single source of truth for features, exposing them to the world through different interfaces (UI, API, MCP, whatever you want).

Next things I probably want to do:
- define API-based WebMCP tools using the same approach, so the user can choose if they want the UI showcase or the fast track.
- Zoom out a little, and consider what this means from a security perspective :D

What's your opinion? Have you tried something similar on your own?
Is this something you would find useful or exciting, either from the testing or user-facing /UX perspective?

u/TranslatorRude4917 — 20 hours ago
▲ 2 r/mcp

How are people threat-modeling local agents with tool access?

For people running local agents via MCP — how are you thinking about threat modeling for tool access?

Traditional security assumes a human is in the loop. Agents break that assumption by taking actions autonomously. Looking to understand:

  • What risks are you actually tracking?
  • What hardening controls have you implemented?
  • What's missing from current threat intel on agent security?

Built a reference covering OpenClaw/Claw agent risks, hardening options, and evidence. Looking for technical feedback on what's missing or oversimplified.

armorerlabs.com/threat-intel

reddit.com
u/Conscious_Chapter_93 — 21 hours ago
▲ 1 r/mcp+1 crossposts

How I built CloudOps Assistant — a Slack bot that analyzes cloud infrastructure through conversation

I was tired of bouncing across 5–6 AWS consoles for routine ops on my own infra, so I tried wiring an AWS MCP server straight into a Slack bot. "Just an LLM with tools" — easy, right?

It broke in three ways that are probably pretty common once MCP leaves a single-developer setup.

  1. Single-session design. The MCP server is built around one credential set per process. As soon as the bot needs to handle more than one identity — multiple users, or even one person juggling several AWS accounts and roles — you're either leaking permissions or serializing everything behind a single credential.

  2. Slack's response window vs. real analysis time. Useful queries ("which ECS service drove the cost spike this week?") take 20–60s and multiple tool calls. Slack times out long before the LLM is done.

  3. One-shot tool calls aren't enough. Almost every useful query was a chain: list resources → filter → fetch metrics → correlate. The model needs to loop until it decides it has the answer, not stop after the first tool returns.

So I rewired it.

- Per-identity MCP proxy. Each identity gets an isolated subprocess where its STS AssumeRole credentials are injected. Pooled, not one-per-request, so cold starts don't kill UX.

- SQS between Slack and the worker. Slack ack returns immediately; the worker processes async and posts back into the thread. Timeouts stop being a thing.

- Agent loop, not single tool call. The LLM keeps calling tools (Cost Explorer → CloudWatch → tag lookups → IAM) until it claims it's done. Bounded by max-iterations and a budget.

Cost spike investigations, "find anything publicly exposed", and "what caused yesterday's RDS CPU spike" are all answerable from Slack now, without opening a console.

Honestly the LLM was the easy part. The interesting work was the permission boundary and execution flow around it.

Curious how others have handled credential isolation when putting LLM agents in front of cloud infra — a proxy-per-identity feels heavy but I haven't found a cleaner pattern.

reddit.com
u/basejb — 1 day ago
▲ 109 r/mcp+2 crossposts

I usually have two or more Claude Code sessions open at once. One in the backend repo, one in the frontend. Half the time I'd be in the frontend asking "wait, what shape did the user object end up as?", then alt-tab, ask the backend session, copy the answer, alt-tab back, paste.

The other Claude was right there. It already knew. I was the bottleneck.

So I wrote a plugin called Relay. In the frontend window I just say:

▎ask the backend session what the user object looks like

The backend session sees the question between turns, answers it, and the reply pops up in my frontend session as a notification. No window switching. No copy-paste. Works for broadcasts too, like "ask everyone what they're working on", and the replies trickle in one at a time.

The mechanism is simpler than it sounds. Claude Code shipped a channels capability a while back that lets MCP servers push messages into a session between turns. Relay piggybacks on that. Each session runs a tiny MCP server, a single hub daemon on your machine routes between them over a unix socket, and inbound asks land as channel notifications so Claude reacts to them naturally on its next turn. First session you start spawns the hub. It self-exits about 5 min after the last session disconnects. Same machine only, no auth, nothing leaves your box.

I know there are other "make Claudes coordinate" projects. Most of them are orchestration frameworks where one boss Claude bosses worker Claudes around. This isn't that. It's just messaging between sessions you already have open, doing whatever you already had them doing. Closer to slack-for-your-claudes than to a swarm runner.

Repo with install steps: https://github.com/innestic/claude-relay (MIT)

It's day-one open source so the rough edges are real. If you run multi-session workflows already, what's the dumb friction you keep hitting? That's what I want to fix next.

u/vildanbina — 2 days ago
▲ 2 r/mcp

Has anyone here actually used Descope MCP?

Seeing MCP everywhere lately and Descope seems to be pushing hard into the auth/integration layer around it. Has anyone here actually tried Descope MCP yet?

Want to know if it genuinely simplifies tool integrations for LLM apps or just adds another abstraction layer on top of APIs

▲ 1 r/mcp

I built an open-source MCP server to stop manually logging Tempo time every day

I built a small MCP server for Jira Tempo because I was tired of manually logging my time every day.

Not a huge problem, just one of those small repetitive tasks that slowly becomes annoying:

open Jira -> find the issue -> open Tempo -> add the worklog -> write the description -> repeat tomorrow.

I use agents a lot in my workflow, so I wanted to make this part more natural. Instead of logging time manually, I can now finish my work session and ask the agent to log the time for me.

The server currently supports:

- retrieving worklogs

- creating worklogs

- bulk creating worklogs

- editing/deleting worklogs

- finding missing worklog days

- basic worklog analytics

It works with Claude, Codex, Cursor and any other MCP-compatible clients.

I made it open-source because I figured other people using Jira + Tempo might have the same annoying workflow. I'm also curious what features would make this more useful for real teams.

Repo:

https://github.com/ivelin-web/tempo-mcp-server

Would appreciate feedback, ideas, or contributions.

reddit.com
u/IvelinDev — 1 day ago
▲ 3 r/mcp

Handling multiple MCP servers and multiple models together

To everyone who has connected to multiple MCP servers and multiple models from different providers, how do you you guys maintain the infrastructure while keeping the tokens in check?? I use an OSS llm gateway for this and it seems to work fine. I am curious to know if there are other/better ways people are doing this. Share your infra in the comments.

reddit.com
u/clairedoesdata — 1 day ago