u/Narrow_Cartoonist937

Hi everyone,

Working with LLMs on modern C++ codebases usually hits a wall very quickly: context windows get flooded with massive files, and most standard indexers still struggle with C++20 Module partitions and imports.

We are currently running a live development workflow on a large-scale commercial project consisting of over 7,000 source files, mostly utilizing C++20 modules.

We managed to establish a highly performant workflow using the Codex App on the desktop, combined with VS MCP, IDAP MCP, and a dedicated lightweight tool we created to bridge the C++ gap: mcp-cpp-project-indexer.

The Problem We Solved

Standard file-dumping or naive regex indexing either sends thousands of lines of irrelevant code to the LLM (costly and slow) or completely loses track of C++20 module dependencies.

Instead of trying to replace a full compiler/LSP (like clangd) or performing heavy semantic analysis, our indexer acts purely as a stream- and token-based locator. It maps out files, symbols, and module structures, providing the LLM with exact line references (startLine/endLine).

The Setup & Results

The Stack: Codex App + VS MCP + IDAP MCP + mcp-cpp-project-indexer.
Token Reduction: The LLM only requests and reads the exact code fragments it actually needs. This reduces the text sent to the LLM by up to 86%.
Performance: Written in Python, it includes a file watcher mode that calculates hashes incrementally. It stays up-to-date in real-time during active development without hammering the CPU.
Intelligence: Code/ChatGPT confirmed that the context routing works flawlessly even at this 7,000-file scale.

Why share this?

When we started, we couldn't find a lightweight, production-ready way to make Claude/GPT understand a massive C++20 module graph without spending a fortune on API tokens or waiting ages for context processing. This setup proved that the Model Context Protocol (MCP) is absolutely ready for large enterprise codebases if decoupled correctly.

The project is fully open-source. If you are struggling with C++ context limits or modules in your AI workflow, feel free to check it out, spin it up, or contribute:

👉 GitHub: github.com

I’m happy to answer any questions about how we configured the MCP synergy or how the incremental indexing handles the C++20 module tree!

Managing context limits in large C++20 Module codebases with MCP (Case Study &amp; Tool)

Managing context limits in large C++20 Module codebases with MCP (Case Study & Tool)