u/Tron_tx6

Current challenge:
-We have a product recommendation/search system where precision matters more than recall.

Client expectation is:
- ~95% queries should resolve through deterministic/filter-based retrieval
- Only ~5% should go through RAG/semantic reasoning

Reason:
- Product catalog is limited
- Pure RAG/vector search gives decent recall but poor precision
- Earlier implementation used LLMs (Claude) to generate filters directly from prompts with confidence scoring > 90, but hallucinated filters caused poor SQL retrieval quality.

What I implemented:

Instead of relying on prompt-only filter extraction, I converted metadata into embeddings.
Stored metadata in PGVector using Cohere embeddings.
Each metadata entry is aligned with:
category, subcategory, normalized attributes/tags
Retrieval flow:
Vector similarity retrieval
Hybrid reranking for better precision + recall
Retrieved metadata candidates are then used to construct filters for SQL/product retrieval.
RAG is used only as fallback when filter confidence is low or query intent is ambiguous.

Observed improvements:
Better filter consistency
Reduced hallucinated attributes
Better precision compared to prompt-only extraction
More controllable retrieval pipeline

Questions:

Is this generally the right architecture direction for enterprise product recommendations/search?
Any better approaches for:
metadata normalization
filter confidence scoring
query-to-filter mapping
reducing semantic drift?
Would knowledge graphs/taxonomy mapping help more than embeddings here?
How do teams usually decide when to invoke RAG vs deterministic retrieval?

Would appreciate suggestions from people working on enterprise search, RAG systems, recommendation engines, or e-commerce or medical retrieval pipelines.

Need suggestions/validation on a Filter-first + RAG fallback architecture for Product Recommendations.