What Works for Coding on an M5 with 24GB of Universal Ram
I am new oMLX and relatively new to local LLMs. I have been trying to get Qwen 3.5, Qwen 3 or Gemma 4 running on my M5 with 24GB of universal ram using oMLX. I have tested a number of models from the mlx-community in the size range of 13 - 15GB. To date, they all blow after a few minutes of starting a task with OOM.
I would appreciate hearing what you have working for coding on a Mac with 24GB of RAM.
Is oMLX the best way to run it? I've been trying, hoping may be a better word, to find a model with TurboQuant that will handle the run of the mill dev tasks to help minimize my cost for the larger models.
Thank you in advance! lbe