



MLX-serve vs LM Studio on Apple Silicon ~40% faster in my benchmarks (w/ MTP/PLD)
Benchmarked mlx-serve against LM Studio on Apple Silicon today, roughly +40% faster overall depending on types of workload when using new Gemma4 drafter MTP and PLD in other models.
The gap is widest on echo/repetitive tasks like agentic code editing where speculative decoding really kicks in (+122% on Gemma 4 E2B echo), and more modest on free-form generation (~+20%). Both using the same MLX weights over HTTP so it's a pretty apples-to-apples comparison.
It's a native Zig server so no Python in the stack, and it exposes OpenAI + Anthropic-compatible APIs if that matters to your setup. Posting in case anyone else is trying to squeeze more out of their M-series chip.