Upgrade from dual 5060ti: DGX Spark? Halo Strix? Other?
Hey Gang! I currently have a system running dual 5060ti with 16GB for a total of 32GB. Been running Qwen 3.6 35B (Q5) on llama.cpp with TurboQuant set to 4-bit with maxed context, getting around mid 20s in output tokens on the average all hooked up to Hermes. So far, I am very impressed with the quality and the speed is more than enough for my “set it and forget it” tasks I send Hermes on.
I want to be able to support larger models and/or less quantized versions for better quality. I also want to be able to support more parallelized work flows and have multiple users (4) taping into the same back end with their own Hermes instances. I want to add in something to my set up that would help facilitate this expansion. Right now I have a budget of about $4k, so I could get a DGX Spark, Halo Strix or possibly swap out one 5060ti for a Blackwell 5000 Pro (48GB). Apple seems to have dropped off with only 96GB for the Mac Studio M3 Ultra these days at $4k or am I missing something and that is still a “good deal” compared to the other options?
From what I have read the DGX Spark might be a great fit because I want to have more parallel tasks going on and I am not afraid of Linux and I believe it will be about 2x faster than my dual 5060ti. The Halo Strix seems to be the most “flexible” of all these options in that you can give up on AI and just use it as a PC, but I guess you could say the same thing about the Macs. While I did mention the Blackwell 5000, that seems rather steep for such a small RAM bump.
What is the collective’s thoughts?