▲ 4 r/LLM
GPU Recommendation
We’re a small municipality (10-15 employees) wanting to build a fully on-prem RAG system for internal documents and regulations. Expected load: max 3-4 concurrent text queries. Strong data privacy requirements, no cloud.
Questions:
- What GPU is realistically needed? (e.g. single RTX 4090/5090, A6000, or more?)
- Recommended model size? (7B–13B vs 32B/70B quantized)
- Any experiences with similar small on-prem setups?
Looking for good speed without overkill.
Thanks!
u/Hot_Cheetah_8984 — 1 day ago