u/yoracale

▲ 71 r/unsloth

4-bit Qwen3.6 MTP GGUF cited 70+ websites with one prompt!

4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt.

Try this locally with Unsloth Studio on 20GB RAM.

Unsloth now supports automatic MTP + speculative decoding for supported models. Unsloth also now auto-selects the best MTP settings for your specific device (Mac, CPU, GPU etc.)

We also fixed many bugs and issues including tokens/s not showing up correctly and MTP not being applied properly.

GitHub: https://github.com/unslothai/unsloth

u/yoracale — 19 hours ago
▲ 600 r/unsloth+1 crossposts

Run Qwen3.6 MTP GGUFs locally!

Hey guys, Qwen3.6 can run ~1.4–2.2× faster with no accuracy change due to MTP. You can run this locally on just 18GB RAM, VRAM or unified memory.

The Qwen3.6 Unsloth GGUFs are now out of experimental mode, llama.cpp has merged many PRs, and MTP is now properly supported in Unsloth. MTP is now ready!

Please use the latest Unsloth `v0.1.41-beta`, not `v0.1.405-beta` which is older. In Studio, we automatically set all the params for you depending on your specific hardware so you get the near best results (you can still change it)

Qwen3.6-27B MTP can run at 160 tokens/s. Qwen3.6-35B-A3B MTP GGUF reaches 240 tokens/s. We also uploaded MTP GGUFs for Qwen3.5!

27B MTP GGUF: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF
35B-A3B MTP GGUF: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF

Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

Thank you! We've got lots of releases this week as well.

u/yoracale — 2 days ago
▲ 145 r/unsloth

Run Qwen3.6 MTP GGUFs in Unsloth Studio!

Hey guys, Qwen3.6 MTP GGUFs now work in Unsloth Studio: https://github.com/unslothai/unsloth

Just update Unsloth Studio or do a fresh install.

MacOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows PowerShell:

irm https://unsloth.ai/install.ps1 | iex

As always huge thanks to llama.cpp and devs for making this possible.

We'll be doing a new pypi release with lots of new updates tomorrow! Lots!!!

u/yoracale — 3 days ago
▲ 309 r/unsloth

Qwen3.6 MTP Unsloth Experimental GGUFs

Hey guys, some of you may seen our Qwen3.6 MTP GGUFs. MTP (Multi Token Prediction) speculative decoding enables models like Qwen3.6 to have ~1.4-2x faster generation with no change in accuracy. This enables Qwen3.6 27B and 35B-A3B to have >1.4x speed-up over the original baseline which is especially useful for local models.

Qwen3.6 27B can now do 140 tokens / s generation and Qwen3.6 35B-A3B 220 tokens / s generation! See MTP Benchmarks for more details.

Regarding draft tokens, we found 2 to be the best. The acceptance rate defs drops, so it's probs best in general to stick with 2. For coding, maybe 3 will work fine since more tokens probs gets accepted

You must use the specific llama.cpp PR branch which we give instructions for in our guide below. Unsloth Studio will support it once the PR is merged.

We're now uploading MTP quants for Qwen3.5 smaller models. Thank you!

u/yoracale — 7 days ago
▲ 469 r/unsloth

Unsloth joins PyTorch Ecosystem!

Hey guys, we're super excited to announce that Unsloth has officially joined the PyTorch Ecosystem! 🔥🦥

In case you didn't know, Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Unsloth will remain as an independent open-source project, separate from the PyTorch Foundation.

Blog: https://unsloth.ai/blog/pytorch

GitHub: https://github.com/unslothai/unsloth

Thanks to all of you for making this possible! 💕

u/yoracale — 9 days ago
▲ 33 r/unsloth

Hey guys, we did bug fixes for Unsloth where chat history was not being shown (existing chat history is not lost) and attachments not attaching correctly. It was a visual bug and render-only. So please update to the latest version of Unsloth: https://unsloth.ai/docs/new/studio/install

Latest version: v0.1.39-beta

Use 2026.5.2 or directly call curl -fsSL https://unsloth.ai/install.sh | sh or unsloth studio update to update

Thanks so much!

u/yoracale — 14 days ago
▲ 231 r/unsloth+1 crossposts

Hey guys, you can now run open LLMs in Claude Code, Codex and OpenClaw via Unsloth's API inference endpoint and we made lots of tutorials for it!

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM.

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp.

Guide: https://unsloth.ai/docs/basics/api

Unsloth makes it easy to deploy a fast API inference endpoint that provides:

Please update Unsloth to leverage this new update and let us know if you have any feedback. Thank you!!

u/tomByrer — 13 days ago
▲ 64 r/unsloth

Hey guys we recently released new NVFP4 quants for Qwen3.6 which you can run via vLLM, SGLang etc for fast inference on supported GPUs like RTX 50 series.

The NVFP4 quants ate calibrated on the Hugging Face UltraChat dataset with sequences up to 16K context length and an approximately 2M-token calibration budget.

27b: https://huggingface.co/unsloth/Qwen3.6-27B-NVFP4

35b-a3b: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-NVFP4

Please note they may be improvements in the near future for NVFP4 quants. Let us know if you have any suggestions thanks. 🙏

u/yoracale — 15 days ago
▲ 138 r/unsloth

Hey guys Google updated their official chat template for the Gemma 4 models a few days ago and have asked all providers to update their quants / implementations etc.

MLX, safetensor, GGUF, all formats are affected.

So these new uploads are just using the new chat template edits by Google. According to Google, this should make tool-calling more stable.

You don't need to redownload the models and just copy and paste Google's new official chat template. Or if you don't know how to, you can just redownload the GGUFs.

A reminder to not use CUDA 13.2 until it is fixed.

Models / GGUFs: https://huggingface.co/collections/unsloth/gemma-4
Guide: https://unsloth.ai/docs/models/gemma-4

Thank you guys and have a lovely week!

reddit.com
u/yoracale — 16 days ago
▲ 187 r/unsloth

Hey guys, we worked with Mistral to fix Mistral Medium 3.5 inference affecting some implementations, and released updated GGUFs with the fix (NOT related to Unsloth or quants). Mistral 3.5 now works properly in transformers AND llama.cpp.

The issue was caused by a YaRN parsing quirk affecting some implementations. Changing mscale_all_dim from 1 to 0 resolved it. We also fixed mmproj files generation.

Mistral has pushed our fixes to their official repo. The YaRN scaling multiplier is applied correctly, fixing forgetting previous conversations.

Thanks a lot to everyone who firstly reported the issue to us after using GGUFs.

Guide: https://unsloth.ai/docs/models/mistral-3.5

GGUFs: https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF

Have a lovely weekend guys!

u/yoracale — 18 days ago
▲ 49 r/unsloth

Hey guys, just letting you all know that Unsloth does NOT use Pytorch Lighting anywhere in the package thus, Unsloth is not affected. And you do not need to worry about the recent compromise. Thank you. 🙏

You can ask any questions here ornon GitHub: https://github.com/unslothai/unsloth/issues/5236

u/yoracale — 20 days ago
▲ 376 r/unsloth

Mistral releases Mistral Medium 3.5, a new vision reasoning model. 🔥

Mistral-Medium-3.5-128B offers highly competitive performance for models 5x its size.

We’re working with Mistral on llama.cpp GGUF implementation. Testing shows that this behavior occurs regardless of who or how the model was converted GGUF. The model initially responds correctly, but over long context, does not work properly.

Mistral has now labeled GGUF support as a WIP (work in progress). The issue appears most likely to be with the current GGUF parser. Will update once resolved.

u/yoracale — 21 days ago
▲ 354 r/unsloth

Hey guys just recently, Unsloth is now one of the top 10 most followed organizations on Hugging Face! 🤗🦥

Thanks so much as always for all the support and we couldn't have done this without you guys! :)

Our Hugging Face profile: https://huggingface.co/unsloth

u/yoracale — 23 days ago
▲ 713 r/unsloth

DeepSeek releases DeepSeek-V4 their latest SOTA open models. There are two models:

  • DeepSeek-V4-Pro: 1.6T params / 49B active
  • DeepSeek-V4-Flash: 284B params / 13B active.
  • DeepSeek-V4-Pro rivals Claude-Opus-4.6-Max, GPT-5.4-xHigh.
  • They support 1M context length, thinking and set new records for Codeforces.

Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Hugging Face: https://huggingface.co/collections/deepseek-ai/deepseek-v4

u/yoracale — 26 days ago
▲ 547 r/unsloth

Hey guys we showcase the power of 2-bit Qwen3.6-27B and Unsloth Studio!

2-bit Qwen3.6-27B GGUF made 26 tool calls, triaged 15 GitHub issues, executed code, fixed, tested + reproed our repo’s 3 latest issues. 🔥

We now added a Preserve thinking toggle! P.S. give Unsloth studio a try or update it as we added maaaany new features and introduced a whole new look!

Try it yourself via Unsloth Studio: https://github.com/unslothai/unsloth

u/yoracale — 27 days ago