u/Hot_Cheetah_8984

GPU Recommendation

We’re a small municipality (10-15 employees) wanting to build a fully on-prem RAG system for internal documents and regulations. Expected load: max 3-4 concurrent text queries. Strong data privacy requirements, no cloud.

Questions:

What GPU is realistically needed? (e.g. single RTX 4090/5090, A6000, or more?)
Recommended model size? (7B–13B vs 32B/70B quantized)
Any experiences with similar small on-prem setups?

Looking for good speed without overkill.

Thanks!

reddit.com

u/Hot_Cheetah_8984 — 1 day ago