u/Educational_Rope_523

Hey guys so…. I’m looking for an honest opinions before I fully commit to this workstation setup.
I’m looking at building a serious local AI / BlackBox style workstation with these specs:

AMD Ryzen 9 9950X3D2
192GB DDR5 RAM
NVIDIA RTX PRO 6000 Blackwell
96GB GDDR7 ECC VRAM
4TB Samsung 990 Pro NVMe SSD
Windows 11 Pro
Single GPU setup for now…

Main use case would be local LLM work, RAG/vector databases, document analysis, coding agents, local AI assistants, inference and experimenting with heavier agentic workflows…. The main reason I’m looking at the RTX PRO 6000 Blackwell is the 96GB VRAM. I understand this is probably overkill for basic local modelsbut I’m specifically interested in running larger models, especially around the 70B/80B with enough VRAM headroom to avoid constantly compromising on quantization…context ..size or performance.

My questions:

Is a single RTX PRO 6000 Blackwell 96GB a realistic high end choice for local 70B/80B inference?
Would this setup comfortably run an 80B model at usable quantization with decent context?
Would 192GB system RAM be enough for RAG/vector DB/document workflows alongside the model?
Would you recommend llama.cpp, vLLM, Ollama, LM Studio or something else for this kind of machine?
What are the biggest bottlenecks or failure modes I’m probably underestimating?
Is this a smart “buy once, cry once” setup or would you approach it differently?
I know cloud GPUs may still make more sense for some workloads but the goal here is local control, privacy, always available inference and building a long term local AI workstation.
Appreciate any honest thoughts especially from people running 70B/80B models locally.

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows