u/Entire_Doubt6965 — reddlx

5 open-source models that fit in 32GB at Q4 quantization, all via Ollama or llmstudio. Tested on both NVIDIA (RTX 5090) and Apple Silicon (M4 Max).

The lineup:

ollama run qwen3:32b # winner for general use ollama run qwen2.5-coder:32b # matches GPT-4o on HumanEval, runs offline ollama run deepseek-r1:32b # best for chain-of-thought reasoning ollama run gemma3:27b # Google's open-weights, native vision ollama run mistral-small:24b # fastest tokens/sec, best for agent loops

Which model wins which workload: https://vist.ly/426dd

Two practical notes:

On M4 Max → use the MLX path for ~30% faster inference than the default Ollama backend
On RTX 5090 → straight ollama run is fine, no tuning needed for any of these

I understand m4max can use run way more powerful models than these but when you are running multiple tabs, apps, cursor, claude code etc, all these models ran perfectly.

Curious what everyone here defaults to — is anyone running multiple models in parallel for different tasks, or am I just being indecisive?

Best 32GB model in 2026: Qwen3 vs Qwen2.5-Coder vs DeepSeek-R1 — full comparison

Wrote up a comparison of the 5 best models that fit in 32GB at Q4 right now. Was curious which would win for general use vs coding vs reasoning, since most rankings I see only test one workload.

Quick verdict if you don't want to read everything:

General/chat → Qwen3 32B (78 avg)
Coding → Qwen2.5-Coder 32B (matches GPT-4o on HumanEval)
Math/CoT → DeepSeek-R1-Distill-Qwen-32B
Multimodal → Gemma 3 27B (native vision)
Fastest → Mistral Small 3.1 24B (good for agent loops)

All run under 20GB at Q4 except Mistral (14GB). Tested on RTX 5090 + M4 Max — Mac unified memory has the OS-share gotcha (36GB → ~28GB usable).

What's your daily-driver right now? Curious whether Qwen3 or DeepSeek-R1 is winning for most people on the "does it actually work for daily tasks" axis. #localllm #localllm #qwen #deepseek