u/terdia

I spent few hours testing Gemma 4 locally as a coding assistant on my MacBook Pro M5 Pro (48GB). Here's what actually happened.

Google just released Gemma 4 under Apache 2.0. I pulled the 26B MoE model via Ollama (17GB download). Direct chat through `ollama run gemma4:26b` was fast. Text generation, code snippets, explanations, all snappy. The model runs great on consumer hardware.

Then I tried using it as an actual coding agent.

I tested it through Claude Code, OpenAI Codex, Continue.dev (VS Code extension), and Pi (open source agent CLI by Mario Zechner). With Gemma 4 (both 26B and E4B), every single one was either unusable or broken.

Claude Code and Codex: A simple "what is my app about" was still spinning after 5 minutes. I had to kill it. The problem is these tools send massive system prompts, file contents, tool definitions, and planning context before the model even starts generating. Datacenter GPUs handle that easily. Your laptop does not.

Continue.dev: Chat worked fine but agent mode couldn't create files. Kept throwing "Could not resolve filepath" errors.

Pi + Gemma 4: Same issue. The model was too slow and couldn't reliably produce the structured tool calls Pi needs to write files and run commands.

At this point I was ready to write the whole thing off. But then I switched models.

Pulled qwen3-coder via Ollama and pointed Pi at it. Night and day. Created files, ran commands, handled multi-step tasks. Actually usable as a local coding assistant. No cloud, no API costs, no sending proprietary code anywhere.

So the issue was never really the agent tools. It was the model. Gemma 4 is a great general-purpose model but it doesn't reliably produce the structured tool-calling output these agents depend on. qwen3-coder is specifically trained for that.

My setup now:

- Ollama running qwen3-coder (and gemma4:26b for general chat)

- Pi as the agent layer (lightweight, open source, supports Ollama natively)

- Claude Code with Anthropic's cloud models for anything complex

To be clear, this is still experimental. Cloud models are far ahead for anything meaningful. But for simple tasks, scaffolding, or working on code I'd rather keep private, having a local agent that actually works is a nice option.

Hardware: MacBook Pro M5 Pro, 48GB unified memory, 1TB
Models tested: gemma4:26b, gemma4:e4b, qwen3-coder
Tools tested: Claude Code, OpenAI Codex, Continue.dev, Pi
Happy to answer questions if anyone wants to try a similar setup.

https://preview.redd.it/xt8bqfoed6tg1.png?width=1710&format=png&auto=webp&s=2b378670f3a22248f0f81eef1ec1d881d4f11ff0

Tested Gemma 4 as a local coding agent on M5 Pro. It failed. Then I found what actually works.