Here is a full decision framework on RAG vs context/indexing, I've noticed that when this comes up often and most teams default to RAG when they shouldn't, or the other way around
1. Is the agent the only consumer?
If humans are querying the same corpus at scale, you need RAG. Vector search at the chunk level is the right pattern for "let me find the doc that explains X" use cases. If only your agent reads the data, you have more flexibility.
2. Does the data change?
Static docs like manuals, policies, papers, completed reports, etc. work fine with RAG. but dynamic data like CRM notes, threads, basically anything edited daily, breaks the embed-and-fetch pattern.
Re-embedding nightly leaves you with stale data between syncs and re-embedding on every change can add up, so if your data changes then you want event-driven indexing
3. Do answers span sources?
If the answer to a question lives entirely inside one doc then RAG is fine, but if the answer includes say email and docs and slack then chunk similarity won't bridge that. Bascially, cross-source questions need a graph or a system that links sources at ingest.
4. Is the output schema important?
If you're returning text for a human to read, raw chunks work, but if you're feeding the output to a different system i.e. CRM, dashboard, wherever, then the agent needs type fields and best to use schema-bound output. RAG with prompt engineering gets you maybe 80% of the way there with hallucinated keys and dropped fields on the rest. For production systems that need reliability you want extraction enforced server-side
5. Do permissions vary by user?
Multi-tenant RAG is a lot trickier than single-user, and service-account indexing means the LLM sees chunks the asking user shouldn't. You need permissions at query time, fetched live from the source and not embedded into the index
Basically if you answer yes to most of these, you want a context engine, not a RAG pipeline. If most are no, RAG is the right tool, don't over-engineer.