
Debating buying an M5 MBP with 128 and spent the day playing around on Vast.ai renting to try out some models. First, Qwen 2.5 Coder 32B was pretty OK, Qwen 3.6 27B felt superior and thoughtful. It's not Claude, maybe Haiku on a good day, but 27B model locally and it thinks, it's pretty dang good for mid stuff and not having to worry about usage running out grinding it. Aider pointed at a local Ollama endpoint may be something I need to spend more time on. I'm so spoiled with Claude Code or Codex, I have to learn the open source landscape better.
Also, I tested Llama 3.3 70B abliterated, Hermes 3 70B uncensored, and Qwen 2.5 72B abliterated through SillyTavern. Hermes won. They were all pretty decent, I just struggled mostly getting the ST character configured right and with more time maybe they would have all worked great. I definitely felt the vibe more with Hermes and fought it much less.
Hope that's interesting to anyone looking around spelunking.
One tip if you're testing on Vast.ai: use the PyTorch NGC image when you rent. CUDA toolkit comes pre-installed. If you use a base Ubuntu image you'll hit a wall trying to compile llama.cpp and waste an hour figuring out why nvcc isn't found.
Cheers!