Trying a multi agent setup, need help.
Hi all,
I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.
My setup:
- Main orchestrator (cloud): GPT-5.4
- Executor (local): Gemma 4 26B
- Coding agent (local): Qwen3.5:9B
- Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks
Use cases:
- Sales prospecting based on defined criteria
- Lightweight stock / company research
- Small-to-medium coding tasks
- Productivity workflows (summarising notes, generating reviews)
Issues I’m seeing:
- Long runs timing out
- Context getting messy in multi-step loops
- Outputs look plausible but don’t complete tasks
- Coding agent writes code in chat instead of modifying files
- Runs stall or never finish
- Tool use is much less reliable vs cloud models
Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.
Trying to understand if this is:
- Model choice issue
- Config / orchestration issue
- Hardware limitation
- Or just a bad use case for local models right now
Questions:
- Which local models are most reliable for these use cases?
- Any config changes that significantly improve:
- reliability
- tool execution
- long-run stability
Current config (important bits):
Sub-agents:
- runTimeoutSeconds: 1800
Executor (Peter):
- Model: ollama/gemma4:26b
- thinkingDefault: off
- heartbeat: 0m
Coding agent (Jay):
- Model: ollama/qwen3.5:9b
- thinkingDefault: off
Ollama model registry:
Gemma4:26b
- reasoning: false
- contextWindow: 32768
- maxTokens: 16384
Qwen3.5:9b
- reasoning: true
- contextWindow: 65536
- maxTokens: 32768
I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.
Would really appreciate advice from anyone running something similar on Apple Silicon.