
Qwen3.6 MTP Unsloth GGUFs now 1.8x faster!
Qwen3.6 MTP Unsloth GGUFs now run **1.8x faster, increased from 1.4x just two days ago!**This is due to llama.cpp adding --spec-draft-p-min 0.75!
Args have also changed from--spec-type mtp
to--spec-type draft-mtp
Also increase --spec-draft-n-max 2 to 6
We also released Qwen3.5-0.8B, 2B, 4B, 9B MTP GGUFs! We'll be providing more soon!
For folks who find the new updated branch to have some perf regression, set --spec-draft-p-min to 0.0 to get the old behavior - we provided a plot of the old branch (red) vs the new branch (blue / green) as well.
Also you can use 2 speculative decoding algos - you can add ngram via --spec-type ngram-mod,draft-mtp - the perf isn't yet optimized so I'll do more benchmarks to find better numbers - see https://github.com/ggml-org/llama.cpp/pull/22673
Guide for MTP: https://unsloth.ai/docs/models/qwen3.6#mtp-guide