u/Independent-Flow3408

▲ 6 r/codex

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago
▲ 1 r/claude

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 2 days ago

Open source AI context engine built in Amsterdam — looking for Dutch dev feedback

Anyone else in Amsterdam working on AI tooling

at ING, Booking, Adyen, or similar?

Been building open source dev tools here for the

past 6 months while working in fintech — specifically

around LLM context compression for large codebases.

The Amsterdam tech scene feels underrepresented

in the AI tooling space compared to what's coming

out of London or Berlin. Curious if there are others

here building in this space or meetups worth knowing

about.

Happy to share what I've built and learned if there's

interest — always more useful in person or in a thread

than a cold pitch.

reddit.com
u/Independent-Flow3408 — 2 days ago

TF-IDF over code signatures hits 80% hit@5 retrieval — no vectors, no embeddings. Tested on 18 repos.

Been experimenting with context compression for local models. Wanted to test how far pure heuristic retrieval can go before you actually need vectors.

Method: extract only function signatures + class shapes from source files, run TF-IDF over them against the query.

Results across 18 repos, 90 tasks:

  • 80% hit@5 vs 13.6% random baseline
  • 98.1% token reduction (avg 80K → 1.5K)
  • Zero dependencies, works fully offline

Takeaway: code identifiers are already the compressed representation. Embedding them actually loses information — exact match over signatures keeps it.

Anyone else tried lightweight retrieval before reaching for RAG? Curious where the ceiling actually is.

[tool I used if relevant: github.com/manojmallick/sigmap]

reddit.com
u/Independent-Flow3408 — 4 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 4 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

reddit.com
u/Independent-Flow3408 — 5 days ago

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

  • miss important files
  • reason over incomplete information
  • require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided:

  • embeddings
  • vector DBs
  • external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

I wrote up more details here if anyone wants to dig deeper: https://manojmallick.github.io/sigmap/

reddit.com
u/Independent-Flow3408 — 5 days ago
▲ 0 r/google+1 crossposts

I built an open-source context layer for coding agents that lets me ask, validate, judge groundedness, and locally learn which files matter

I kept running into the same problem when using LLMs on real codebases:

  • large repos → context overflows
  • wrong files get picked
  • multiple retries just to get something usable

Even with good models, it felt like: > the model is guessing because it can’t actually see the system

So I built something to fix that.


Instead of sending raw code, it:

  • extracts only structure (functions, classes, routes)
  • reduces ~80K tokens → ~2K
  • ranks relevant files before each query

Basically a context layer before the LLM.


Results (from running across 18 repos / 90 tasks):

  • retrieval hit@5: 13.6% → ~79%
  • prompts per task: 2.84 → 1.69
  • task success proxy: ~10% → ~52%
  • token reduction: ~97%

What changed in practice

Before:

  • wrong files in context
  • hallucinated logic
  • lots of retries

After:

  • right files show up immediately
  • fewer prompts
  • answers are more grounded in actual code

What’s interesting (unexpected insight)

Structured context mattered more than model size.

In many cases: → smaller models + good context > larger models + raw code


New in latest version

Trying to move beyond just “better context”:

  • ask → builds query-specific context
  • validate → checks coverage before trusting output
  • judge → checks if answer is supported by context
  • local learning (weights per file)

Would love feedback on:

  1. Does this approach actually solve the “wrong context” problem for you?
  2. What would you want beyond retrieval (verification? patch checking?)
  3. Is this better than embeddings/RAG setups you’ve used?

Repo: https://github.com/manojmallick/sigmap

Link to docs : https://manojmallick.github.io/sigmap/

u/Independent-Flow3408 — 5 days ago