Need suggestions/validation on a Filter-first + RAG fallback architecture for Product Recommendations.
Current challenge:
-We have a product recommendation/search system where precision matters more than recall.
Client expectation is:
- ~95% queries should resolve through deterministic/filter-based retrieval
- Only ~5% should go through RAG/semantic reasoning
Reason:
- Product catalog is limited
- Pure RAG/vector search gives decent recall but poor precision
- Earlier implementation used LLMs (Claude) to generate filters directly from prompts with confidence scoring > 90, but hallucinated filters caused poor SQL retrieval quality.
What I implemented:
Instead of relying on prompt-only filter extraction, I converted metadata into embeddings.
Stored metadata in PGVector using Cohere embeddings.
Each metadata entry is aligned with:
category, subcategory, normalized attributes/tags
Retrieval flow:
Vector similarity retrieval
Hybrid reranking for better precision + recall
Retrieved metadata candidates are then used to construct filters for SQL/product retrieval.
RAG is used only as fallback when filter confidence is low or query intent is ambiguous.
Observed improvements:
Better filter consistency
Reduced hallucinated attributes
Better precision compared to prompt-only extraction
More controllable retrieval pipeline
Questions:
- Is this generally the right architecture direction for enterprise product recommendations/search?
- Any better approaches for:
- metadata normalization
- filter confidence scoring
- query-to-filter mapping
- reducing semantic drift?
- Would knowledge graphs/taxonomy mapping help more than embeddings here?
- How do teams usually decide when to invoke RAG vs deterministic retrieval?
Would appreciate suggestions from people working on enterprise search, RAG systems, recommendation engines, or e-commerce or medical retrieval pipelines.