u/Beamsters

Models and Quants quality test results - the chessboard svg (Qwen3.6 27B/35B-A3B/Zaya1)

Models and Quants quality test results - the chessboard svg (Qwen3.6 27B/35B-A3B/Zaya1)

According to this. I run several more tests to cover more models and quants.

https://www.reddit.com/r/LocalLLaMA/comments/1t53dhp/quality_comparison_between_qwen_36_27b/

Qwen3.6 35B-A3B MLX oQ4. 2 extra pawns. (oMLX - local)

Qwen 3.6 35B-A3B MLX oQ4's output is almost perfect. With title, last move label, row and col. But the 2 cursors, one show starting point and the other show end point (red triangles), are a bit confusing at first glance. But 2 extra pawns.

ZAYA1 8B - Perfect but without a-h, 1-8 row/column mark (Zaya Cloud)

ZAYA1 8B is open weight. I used MLX-LM to run it with this PR, but no luck. The 8 bits model kept reasoning in a loop without producing any svg. I don't think the local inference engine is ready yet. Since the model needs RSA technique to perform. So I posted the result from zaya cloud's playground - assuming it is FP16 version of it. If somehow local inference engine can produce the same answer, we will have a VERY promising model to run in our tiny computer. The whole process of running 8 bits quant in my computer take less than 12GB of memory.

Qwen3.6 27B MLX oQ6. Very good (oMLX - local) no row/no column marks

MLX-oQ 6 bit quant of 27B delivered good and correct answer, but no luck pushing to 3.5 bits.

Qwen3.6 27B MLX oQ3.5e, Not so good. (oMLX - local)

HY3 Preview 295B A21B - Perfect but no line. no row and no column. (Open Router)

HY3's 295B is not gonna cut it on my machine. So the result is from the cloud.

Now we're entering the weird territory - using those thousand derivatives found floating in the hugging face. I'll be use ones from Jackrong, OrionLLM and DavidAU since all of them published some kind of benchmarks and promise good results.

GRM 2.6 Plus Q4K_M - a OrionLLM's derivative of Qwen3.6 27B - a correct one and looks really good.

GRM 2.6 Plus Q3K_M - a OrionLLM's derivative of Qwen3.6 27B - 3 bits was not gonna cut it.

qwen3.6-27b-neo-code-di-imatrix-max@iq4_nl - This 4 bits quant is good.

qwen3.6-27b-neo-code-di-imatrix-max@q5k_s - However its 5 bits counterpart was totally wrong.

It doesn't mean that higher bit quant will always perform better than the lower bit ones.

Qwopus 35B-A3B-v1 Jackrong's Q4K_S - the board is wrong and the word game ended came out of nowhere.

GRM 2.6 Opus 3 bit Q3K_M, correct but the visual was degraded. The smallest 27B quant that somehow works.

reddit.com
u/Beamsters — 2 days ago
▲ 11 r/unsloth

The new 27B NVFP4 KLD?

Hi, appreciate your work. I've noticed the new NVFP4 that's just uploaded this week and it claimed that GSM8K/MMLU-Pro are comparable to the original. Can we have the KLD as well? since the last one (MLX-NVFP4) you guys published was pretty terrible compared to the normal 4-bits quant. It's pretty confusing, one is close to the original and the other was worse than normal 4 bits - thank you!

https://preview.redd.it/y2a20uwbxe0h1.png?width=1123&format=png&auto=webp&s=98529d5cb3db2f86c8ec92ce169965f67de1a1d5

reddit.com
u/Beamsters — 4 days ago

MIT license and fully open source. MiMo-V2.5-Pro was just 3 points from Opus 4.7 max and the normal V2.5 is only a step behind SOTA. But both produce 75% and 68% non-hallucination rate. Best intel/hallucination model yet.

V2.5 FP8 is like 316GB, you *might* be able to run a tight 3 bit quant with 128gb m5 max.

From Gemma to Qwen3.6 to Kimi2.6 to Deepseek v4 to MiMo2.5, this probably is the best April.

https://preview.redd.it/fvurbt2ekuxg1.png?width=1076&format=png&auto=webp&s=a62fa83e39d723a7e31c505e516f18074c90a186

https://preview.redd.it/s1vygazekuxg1.png?width=2093&format=png&auto=webp&s=51924f7a0bca951190395ee0d12405f6f1dc7089

reddit.com
u/Beamsters — 17 days ago