▲ 5 r/homeassistant
Gemma4:26b with HA conversation slow TTFT
I really like Gemma4:26b because I can easily push 262k context on a single 3090, but Home Assistant tool calls are like a 40 second time to first token for me. I'm using the newest Ollama docker tag version. Non-tool responses through HA are almost instant, but when I have to pull in the tool definitions it's about a 25k token prompt and hits a bottleneck somewhere, then takes about 40 seconds to respond. No issues with qwen3.5:27b, responds only after a few seconds. I was hoping to get a serious tokens per second boost with MoE.
What is everyone else's experience with Gemma 4 on Home Assistant? Any other models you recommend?
u/Sevealin_ — 15 hours ago