u/Independent-Flow3408

▲ 6 r/codex

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 1 r/claude

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 5 r/ChatGPTPromptGenius

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 0 r/ChatGPT

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 4 r/ArtificialNtelligence

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 0 r/artificial

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 2 r/perplexity_ai

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 5 r/aiagents

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 2 r/AIAssisted

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 2 r/AiAutomations

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 2 r/AiChatGPT

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 3 r/AiForSmallBusiness

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 2 r/AIToolMadeEasy

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 0 r/StudyInTheNetherlands

Open source AI context engine built in Amsterdam — looking for Dutch dev feedback

Anyone else in Amsterdam working on AI tooling

at ING, Booking, Adyen, or similar?

Been building open source dev tools here for the

past 6 months while working in fintech — specifically

around LLM context compression for large codebases.

The Amsterdam tech scene feels underrepresented

in the AI tooling space compared to what's coming

out of London or Berlin. Curious if there are others

here building in this space or meetups worth knowing

about.

Happy to share what I've built and learned if there's

interest — always more useful in person or in a thread

than a cold pitch.

reddit.com

u/Independent-Flow3408 — 2 days ago

▲ 1 r/ChatGPT

TF-IDF over code signatures hits 80% hit@5 retrieval — no vectors, no embeddings. Tested on 18 repos.

Been experimenting with context compression for local models. Wanted to test how far pure heuristic retrieval can go before you actually need vectors.

Method: extract only function signatures + class shapes from source files, run TF-IDF over them against the query.

Results across 18 repos, 90 tasks:

80% hit@5 vs 13.6% random baseline
98.1% token reduction (avg 80K → 1.5K)
Zero dependencies, works fully offline

Takeaway: code identifiers are already the compressed representation. Embedding them actually loses information — exact match over signatures keeps it.

Anyone else tried lightweight retrieval before reaching for RAG? Curious where the ceiling actually is.

[tool I used if relevant: github.com/manojmallick/sigmap]

reddit.com

u/Independent-Flow3408 — 4 days ago

▲ 3 r/generativeAI

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 4 days ago

▲ 9 r/AgentsOfAI

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

> Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided:

embeddings
vector DBs
external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

reddit.com

u/Independent-Flow3408 — 5 days ago

▲ 6 r/GoogleGeminiAI

Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models:

miss important files
reason over incomplete information
require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)