u/PiqueForPresident

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

  • Main orchestrator (cloud): GPT-5.4
  • Executor (local): Gemma 4 26B
  • Coding agent (local): Qwen3.5:9B
  • Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

  • Sales prospecting based on defined criteria
  • Lightweight stock / company research
  • Small-to-medium coding tasks
  • Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

  • Long runs timing out
  • Context getting messy in multi-step loops
  • Outputs look plausible but don’t complete tasks
  • Coding agent writes code in chat instead of modifying files
  • Runs stall or never finish
  • Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

  • Model choice issue
  • Config / orchestration issue
  • Hardware limitation
  • Or just a bad use case for local models right now

Questions:

  • Which local models are most reliable for these use cases?
  • Any config changes that significantly improve:
    • reliability
    • tool execution
    • long-run stability

Current config (important bits):

Sub-agents:

  • runTimeoutSeconds: 1800

Executor (Peter):

  • Model: ollama/gemma4:26b
  • thinkingDefault: off
  • heartbeat: 0m

Coding agent (Jay):

  • Model: ollama/qwen3.5:9b
  • thinkingDefault: off

Ollama model registry:

Gemma4:26b

  • reasoning: false
  • contextWindow: 32768
  • maxTokens: 16384

Qwen3.5:9b

  • reasoning: true
  • contextWindow: 65536
  • maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com
u/PiqueForPresident — 3 days ago

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

  • Main orchestrator (cloud): GPT-5.4
  • Executor (local): Gemma 4 26B
  • Coding agent (local): Qwen3.5:9B
  • Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

  • Sales prospecting based on defined criteria
  • Lightweight stock / company research
  • Small-to-medium coding tasks
  • Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

  • Long runs timing out
  • Context getting messy in multi-step loops
  • Outputs look plausible but don’t complete tasks
  • Coding agent writes code in chat instead of modifying files
  • Runs stall or never finish
  • Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

  • Model choice issue
  • Config / orchestration issue
  • Hardware limitation
  • Or just a bad use case for local models right now

Questions:

  • Which local models are most reliable for these use cases?
  • Any config changes that significantly improve:
    • reliability
    • tool execution
    • long-run stability

Current config (important bits):

Sub-agents:

  • runTimeoutSeconds: 1800

Executor (Peter):

  • Model: ollama/gemma4:26b
  • thinkingDefault: off
  • heartbeat: 0m

Coding agent (Jay):

  • Model: ollama/qwen3.5:9b
  • thinkingDefault: off

Ollama model registry:

Gemma4:26b

  • reasoning: false
  • contextWindow: 32768
  • maxTokens: 16384

Qwen3.5:9b

  • reasoning: true
  • contextWindow: 65536
  • maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com
u/PiqueForPresident — 3 days ago

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

  • Main orchestrator (cloud): GPT-5.4
  • Executor (local): Gemma 4 26B
  • Coding agent (local): Qwen3.5:9B
  • Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

  • Sales prospecting based on defined criteria
  • Lightweight stock / company research
  • Small-to-medium coding tasks
  • Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

  • Long runs timing out
  • Context getting messy in multi-step loops
  • Outputs look plausible but don’t complete tasks
  • Coding agent writes code in chat instead of modifying files
  • Runs stall or never finish
  • Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

  • Model choice issue
  • Config / orchestration issue
  • Hardware limitation
  • Or just a bad use case for local models right now

Questions:

  • Which local models are most reliable for these use cases?
  • Any config changes that significantly improve:
    • reliability
    • tool execution
    • long-run stability

Current config (important bits):

Sub-agents:

  • runTimeoutSeconds: 1800

Executor (Peter):

  • Model: ollama/gemma4:26b
  • thinkingDefault: off
  • heartbeat: 0m

Coding agent (Jay):

  • Model: ollama/qwen3.5:9b
  • thinkingDefault: off

Ollama model registry:

Gemma4:26b

  • reasoning: false
  • contextWindow: 32768
  • maxTokens: 16384

Qwen3.5:9b

  • reasoning: true
  • contextWindow: 65536
  • maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com
u/PiqueForPresident — 3 days ago

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

  • Main orchestrator (cloud): GPT-5.4
  • Executor (local): Gemma 4 26B
  • Coding agent (local): Qwen3.5:9B
  • Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

  • Sales prospecting based on defined criteria
  • Lightweight stock / company research
  • Small-to-medium coding tasks
  • Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

  • Long runs timing out
  • Context getting messy in multi-step loops
  • Outputs look plausible but don’t complete tasks
  • Coding agent writes code in chat instead of modifying files
  • Runs stall or never finish
  • Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

  • Model choice issue
  • Config / orchestration issue
  • Hardware limitation
  • Or just a bad use case for local models right now

Questions:

  • Which local models are most reliable for these use cases?
  • Any config changes that significantly improve:
    • reliability
    • tool execution
    • long-run stability

Current config (important bits):

Sub-agents:

  • runTimeoutSeconds: 1800

Executor (Peter):

  • Model: ollama/gemma4:26b
  • thinkingDefault: off
  • heartbeat: 0m

Coding agent (Jay):

  • Model: ollama/qwen3.5:9b
  • thinkingDefault: off

Ollama model registry:

Gemma4:26b

  • reasoning: false
  • contextWindow: 32768
  • maxTokens: 16384

Qwen3.5:9b

  • reasoning: true
  • contextWindow: 65536
  • maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com
u/PiqueForPresident — 3 days ago

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

  • Main orchestrator (cloud): GPT-5.4
  • Executor (local): Gemma 4 26B
  • Coding agent (local): Qwen3.5:9B
  • Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

  • Sales prospecting based on defined criteria
  • Lightweight stock / company research
  • Small-to-medium coding tasks
  • Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

  • Long runs timing out
  • Context getting messy in multi-step loops
  • Outputs look plausible but don’t complete tasks
  • Coding agent writes code in chat instead of modifying files
  • Runs stall or never finish
  • Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

  • Model choice issue
  • Config / orchestration issue
  • Hardware limitation
  • Or just a bad use case for local models right now

Questions:

  • Which local models are most reliable for these use cases?
  • Any config changes that significantly improve:
    • reliability
    • tool execution
    • long-run stability

Current config (important bits):

Sub-agents:

  • runTimeoutSeconds: 1800

Executor (Peter):

  • Model: ollama/gemma4:26b
  • thinkingDefault: off
  • heartbeat: 0m

Coding agent (Jay):

  • Model: ollama/qwen3.5:9b
  • thinkingDefault: off

Ollama model registry:

Gemma4:26b

  • reasoning: false
  • contextWindow: 32768
  • maxTokens: 16384

Qwen3.5:9b

  • reasoning: true
  • contextWindow: 65536
  • maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com
u/PiqueForPresident — 3 days ago