▲ 3 r/LocalLLM
Hey everyone! Looking for recommendations on the best local models for my rig, and also need help speeding things up.
My specs:
• RTX 3060 12GB VRAM
• i5-12600K
• 16GB DDR4 3600MHz
Problem first: Running Gemma 3 28B on Ollama and it’s super slow. Model is too big to fit fully in VRAM so it’s spilling into RAM. Would upgrading to 32GB RAM help or is the bottleneck just the VRAM? Better quant to use? Should I just drop to the 12B version?
Also looking for general recommendations:
• Best models that fit fully in 12GB VRAM?
• Good options for coding and creative writing?
• Is Mixtral 8x7B or Llama 3.1 70B worth trying with CPU offloading?
Currently using Ollama + openclow + claude code in Ubuntu system
u/Competitive_Teach564 — 16 days ago