u/yoracale

You can now run Google Gemma 4 locally! (5GB RAM min.)
🔥 Hot ▲ 382 r/LocalLLM

You can now run Google Gemma 4 locally! (5GB RAM min.)

Hey guys! Google just released their new open-source model family: Gemma 4.

The four models have thinking and multimodal capabilities. There's two small ones: E2B and E4B, and two large ones: 26B-A4B and 31B. Gemma 4 is strong at reasoning, coding, tool use, long-context and agentic workflows.

The 31B model is the smartest but 26B-A4B is much faster due to it's MoE arch. E2B and E4B are great for phones and laptops.

To run the models locally (laptop, Mac, desktop etc), we at Unsloth converted these models so it can fit on your device. You can now run and train the Gemma 4 models via Unsloth Studio: https://github.com/unslothai/unsloth

Recommended setups:

  • E2B / E4B: 10+ tokens/s in near-full precision with ~6GB RAM / unified mem. 4-bit variants can run on 4-5GB RAM.
  • 26B-A4B: 30+ tokens/s in near-full precision with ~30GB RAM / unified mem. 4-bit works on 16GB RAM.
  • 31B: 15+ tokens/s in near-full precision with ~35GB RAM.

No is GPU required, especially for the smaller models, but having one will increase inference speeds (~80 tokens/s). With an RTX 5090 you can get 140 tokens/s throughput which is way faster than ChatGPT.
Even if you don't meet the requirements, you can still run the models (e.g. 3GB CPU), but inference will be much slower. Link to Gemma 4 GGUFs to run.

Example of Gemma 4-26B-4AB running

You can run or train Gemma 4 via Unsloth Studio:

We've now made installation take only 1-2mins:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex
  • The Unsloth Studio Desktop app is coming very soon (this month).
  • Tool-calling is now 50-80% more accurate and inference is 10-20% faster

We recommend reading our step-by-step guide which covers everything: https://unsloth.ai/docs/models/gemma-4

Thanks so much once again for reading!

reddit.com
u/yoracale — 1 day ago
You can now run Google's Gemma 4 model on your local device! (6GB RAM)
🔥 Hot ▲ 551 r/selfhosted

You can now run Google's Gemma 4 model on your local device! (6GB RAM)

Hello everyone! Google just released their new open-source model family: Gemma 4. This means you can now run a ChatGPT like model at home.

There are four models and they all have thinking and multimodal capabilities. There's two small ones: E2B and E4B, and two large ones: 26B-A4B and 31B. The 31B model is the smartest but 26B-A4B is much faster due to it's MoE arch. E2B and E4B are great for phones and laptops.

To run the models locally (laptop, Mac, desktop etc), we at Unsloth converted these models so it can fit on your device. You can now run and train the Gemma 4 models via Unsloth Studio: https://github.com/unslothai/unsloth

Recommended setups:

  • E2B / E4B: 10+ tokens/s in near-full precision with ~6GB RAM / unified mem. 4-bit variants can run on 4-5GB RAM.
  • 26B-A4B: 30+ tokens/s in near-full precision with ~30GB RAM / unified mem. 4-bit works on 16GB RAM.
  • 31B: 15+ tokens/s in near-full precision with ~35GB RAM.

No is GPU required, especially for the smaller models, but having one will increase inference speeds (~80 tokens/s). With an RTX 5090 you can get 140 tokens/s throughput which is way faster than ChatGPT.
Even if you don't meet the requirements, you can still run the models (e.g. 3GB CPU), but inference will be much slower. Link to Gemma 4 GGUFs to run.

It's recommend to use our iMatrix-quantized GGUFs instead of standard quants. They’re calibrated on coding and conversational datasets, which greatly improves accuracy over standard model quantization. See our Dynamic 2.0 GGUF article for details and benchmarks: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

Example of Gemma 4-26B-4AB running

You can run or train Gemma 4 via Unsloth Studio:

We've now made installation take only 1-2mins:

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex
  • The Unsloth Studio Desktop app is coming very soon (this month).
  • Tool-calling is now 50-80% more accurate and inference is 10-20% faster

We recommend reading our step-by-step guide which covers everything: https://unsloth.ai/docs/models/gemma-4

Thanks so much once again for reading and let me know if you have any questions.

reddit.com
u/yoracale — 2 days ago