M4 Max 36gb, oMLX - model recommendation?
I have tried a few models, but cant find a suitable model thats useful and runs on my machine.
Gemma4:27b has good speed (outputting up to 40t/s) and is awesome for textbased stuff, but is sloppy AF when it comes to tool calling and agent work. With this setup i have a nice chat bot but is not very useful as an agent helper.
I tried Qwen3.6-27B-4bit, which runs so unbearably slow (1.5t/s) so i can not even test if it works better with hermes.
I tried GLM-4.7-Flash-MLX-6bit and it seems to have okay speed and at least can reliably call the tools, but seems to crash the omlx server frequently... thus not very usable
I know that 36gb memory is not a lot for local llms but is there a good sweet spot model for accuracy, tool calling and speed?