u/Connect_Bee_3661

▲ 0 r/LangChain+1 crossposts

Mapped LangChain Core as a dependency graph: 180 modules, 650 edges.

Three findings:

  1. The messages module has a 70% blast radius. Change it and 126 of 180 modules break — directly or transitively. Every callback, every agent, every retriever traces back to it. Nothing in the documentation flags this.

  2. runnables.base requires 147 other modules as prerequisites — 82% of the codebase. A coding agent dispatched to modify it without that map is guessing.

  3. Exactly 7 modules are safe to modify with zero downstream risk. Seven. Out of 180.

The practical problem: a coding agent using RAG to navigate LangChain will grep for context, retrieve similar-looking docs, and make a structurally wrong change. The blast radius is invisible to similarity search. It's only visible to graph traversal.

This is the difference between retrieval and spatial intelligence. RAG finds text that looks relevant. A knowledge graph tells you what actually breaks.

Same approach works on any structured domain — GLP-1 pharmacology, ICD-10 classification, payer formularies. The domain doesn't matter. The structure does.

Built the CKG from the LangChain Core source. Dataset is live. Links in first comment.

reddit.com
u/Connect_Bee_3661 — 13 days ago

If you're running local models, token count is everything. I benchmarked three retrieval architectures specifically to measure that:

**RAG (FAISS):** 2,982 tokens/query — F1 = 0.123

**GraphRAG (Microsoft):** 3,450 tokens/query — F1 = 0.120

**CKG (pre-structured domain graph):** 269 tokens/query — F1 = 0.471

Same questions, same model, same eval. The pre-structured graph uses 11× fewer tokens and gets 4× better answers.

**Why it works for local inference:**

Instead of retrieving chunks at query time (which inflates context with noise), a Compact Knowledge Graph pre-encodes the domain as a traversable DAG. The model gets exactly what it needs — structure, not similarity scores.

**The hop-depth finding matters:**

CKG F1 improves with query complexity: 0.374 at hop=1 → 0.772 at hop=5. RAG peaks at hop=2 and degrades. For multi-step reasoning (prerequisites, dependency chains, "what depends on X"), pre-structure wins by a wider margin the harder the question.

**Practical test — GLP-1 pharma domain:**

Built from ClinicalTrials.gov API in a single session, no expert curation. F1 = 0.530. The structure was already in the data — the graph just makes it traversable.

**Works with any LLM** (not Claude-specific). MCP server if you want plug-and-play:

`pip install ckg-mcp`

Full benchmark + paper + reproducible code:

https://github.com/Yarmoluk/ckg-benchmark

Dataset (all 45 domain CSVs + query JSONL, CC-BY-4.0):

https://huggingface.co/datasets/danyarm/ckg-benchmark

Live demo (query CKG vs. RAG side by side, see token count + F1):

https://huggingface.co/spaces/danyarm/ckg-demo

reddit.com
u/Connect_Bee_3661 — 13 days ago

I benchmarked three retrieval architectures across 45 domains and 7,928 queries:

- RAG (FAISS + Claude): F1 = 0.123, 2,982 tokens/query

- GraphRAG (Microsoft): F1 = 0.120, 3,450 tokens/query

- CKG (pre-structured DAG): F1 = 0.471, 269 tokens/query

The key finding: CKG F1 improves continuously with hop depth (0.374 → 0.772 at hop=5). RAG plateaus and degrades past hop=2. For multi-hop structural queries — prerequisites, dependency chains, category aggregation — pre-structure dominates.

Track 2 (GLP-1/pharma domain built from ClinicalTrials.gov API in one session, no expert curation): F1 = 0.530. Structure is the signal, not curation effort.

Live demo: huggingface.co/spaces/danyarm/ckg-demo

Full benchmark + paper: github.com/Yarmoluk/ckg-benchmark

reddit.com
u/Connect_Bee_3661 — 17 days ago