u/Hot_Cheetah_8984

▲ 4 r/LLM

GPU Recommendation

We’re a small municipality (10-15 employees) wanting to build a fully on-prem RAG system for internal documents and regulations. Expected load: max 3-4 concurrent text queries. Strong data privacy requirements, no cloud.

Questions:

  • What GPU is realistically needed? (e.g. single RTX 4090/5090, A6000, or more?)
  • Recommended model size? (7B–13B vs 32B/70B quantized)
  • Any experiences with similar small on-prem setups?

Looking for good speed without overkill.

Thanks!

reddit.com
u/Hot_Cheetah_8984 — 1 day ago