u/PiqueForPresident — reddlx

▲ 7 r/openclawsetup

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

Main orchestrator (cloud): GPT-5.4
Executor (local): Gemma 4 26B
Coding agent (local): Qwen3.5:9B
Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

Sales prospecting based on defined criteria
Lightweight stock / company research
Small-to-medium coding tasks
Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

Long runs timing out
Context getting messy in multi-step loops
Outputs look plausible but don’t complete tasks
Coding agent writes code in chat instead of modifying files
Runs stall or never finish
Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

Model choice issue
Config / orchestration issue
Hardware limitation
Or just a bad use case for local models right now

Questions:

Which local models are most reliable for these use cases?
Any config changes that significantly improve:
- reliability
- tool execution
- long-run stability

Current config (important bits):

Sub-agents:

runTimeoutSeconds: 1800

Executor (Peter):

Model: ollama/gemma4:26b
thinkingDefault: off
heartbeat: 0m

Coding agent (Jay):

Model: ollama/qwen3.5:9b
thinkingDefault: off

Ollama model registry:

Gemma4:26b

reasoning: false
contextWindow: 32768
maxTokens: 16384

Qwen3.5:9b

reasoning: true
contextWindow: 65536
maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com

u/PiqueForPresident — 3 days ago

▲ 2 r/AskClaw

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

Main orchestrator (cloud): GPT-5.4
Executor (local): Gemma 4 26B
Coding agent (local): Qwen3.5:9B
Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

Sales prospecting based on defined criteria
Lightweight stock / company research
Small-to-medium coding tasks
Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

Long runs timing out
Context getting messy in multi-step loops
Outputs look plausible but don’t complete tasks
Coding agent writes code in chat instead of modifying files
Runs stall or never finish
Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

Model choice issue
Config / orchestration issue
Hardware limitation
Or just a bad use case for local models right now

Questions:

Which local models are most reliable for these use cases?
Any config changes that significantly improve:
- reliability
- tool execution
- long-run stability

Current config (important bits):

Sub-agents:

runTimeoutSeconds: 1800

Executor (Peter):

Model: ollama/gemma4:26b
thinkingDefault: off
heartbeat: 0m

Coding agent (Jay):

Model: ollama/qwen3.5:9b
thinkingDefault: off

Ollama model registry:

Gemma4:26b

reasoning: false
contextWindow: 32768
maxTokens: 16384

Qwen3.5:9b

reasoning: true
contextWindow: 65536
maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com

u/PiqueForPresident — 3 days ago

▲ 1 r/OpenClawInstall

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

Main orchestrator (cloud): GPT-5.4
Executor (local): Gemma 4 26B
Coding agent (local): Qwen3.5:9B
Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

Sales prospecting based on defined criteria
Lightweight stock / company research
Small-to-medium coding tasks
Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

Long runs timing out
Context getting messy in multi-step loops
Outputs look plausible but don’t complete tasks
Coding agent writes code in chat instead of modifying files
Runs stall or never finish
Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

Model choice issue
Config / orchestration issue
Hardware limitation
Or just a bad use case for local models right now

Questions:

Which local models are most reliable for these use cases?
Any config changes that significantly improve:
- reliability
- tool execution
- long-run stability

Current config (important bits):

Sub-agents:

runTimeoutSeconds: 1800

Executor (Peter):

Model: ollama/gemma4:26b
thinkingDefault: off
heartbeat: 0m

Coding agent (Jay):

Model: ollama/qwen3.5:9b
thinkingDefault: off

Ollama model registry:

Gemma4:26b

reasoning: false
contextWindow: 32768
maxTokens: 16384

Qwen3.5:9b

reasoning: true
contextWindow: 65536
maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com

u/PiqueForPresident — 3 days ago

▲ 3 r/OpenClawUseCases

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

Main orchestrator (cloud): GPT-5.4
Executor (local): Gemma 4 26B
Coding agent (local): Qwen3.5:9B
Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

Sales prospecting based on defined criteria
Lightweight stock / company research
Small-to-medium coding tasks
Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

Long runs timing out
Context getting messy in multi-step loops
Outputs look plausible but don’t complete tasks
Coding agent writes code in chat instead of modifying files
Runs stall or never finish
Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

Model choice issue
Config / orchestration issue
Hardware limitation
Or just a bad use case for local models right now

Questions:

Which local models are most reliable for these use cases?
Any config changes that significantly improve:
- reliability
- tool execution
- long-run stability

Current config (important bits):

Sub-agents:

runTimeoutSeconds: 1800

Executor (Peter):

Model: ollama/gemma4:26b
thinkingDefault: off
heartbeat: 0m

Coding agent (Jay):

Model: ollama/qwen3.5:9b
thinkingDefault: off

Ollama model registry:

Gemma4:26b

reasoning: false
contextWindow: 32768
maxTokens: 16384

Qwen3.5:9b

reasoning: true
contextWindow: 65536
maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com

u/PiqueForPresident — 3 days ago

▲ 3 r/openclaw

Trying a multi agent setup, need help.

Hi all,

I’m running a local-first agent setup on a Mac mini M4 with 24GB RAM.

My setup:

Main orchestrator (cloud): GPT-5.4
Executor (local): Gemma 4 26B
Coding agent (local): Qwen3.5:9B
Also tried Qwen3-Coder:30B, but couldn’t get it to reliably finish tasks

Use cases:

Sales prospecting based on defined criteria
Lightweight stock / company research
Small-to-medium coding tasks
Productivity workflows (summarising notes, generating reviews)

Issues I’m seeing:

Long runs timing out
Context getting messy in multi-step loops
Outputs look plausible but don’t complete tasks
Coding agent writes code in chat instead of modifying files
Runs stall or never finish
Tool use is much less reliable vs cloud models

Also noticed that larger coding models aren’t consistently better — sometimes less reliable than smaller ones.

Trying to understand if this is:

Model choice issue
Config / orchestration issue
Hardware limitation
Or just a bad use case for local models right now

Questions:

Which local models are most reliable for these use cases?
Any config changes that significantly improve:
- reliability
- tool execution
- long-run stability

Current config (important bits):

Sub-agents:

runTimeoutSeconds: 1800

Executor (Peter):

Model: ollama/gemma4:26b
thinkingDefault: off
heartbeat: 0m

Coding agent (Jay):

Model: ollama/qwen3.5:9b
thinkingDefault: off

Ollama model registry:

Gemma4:26b

reasoning: false
contextWindow: 32768
maxTokens: 16384

Qwen3.5:9b

reasoning: true
contextWindow: 65536
maxTokens: 32768

I’m not expecting cloud-level performance, just trying to get local agents stable enough to be genuinely useful.

Would really appreciate advice from anyone running something similar on Apple Silicon.

reddit.com

u/PiqueForPresident — 3 days ago