Qwen 27B + Hermes Agent: Preserve Thinking On or Off? + Sampling Tips
I’ve been messing around with a local setup and wanted to sanity check a few things with people who’ve gone deeper into this.
Right now I’m running Qwen 3.6 27B (Q5, MTP) on a 3090, hooked up to a Hermes agent setup. Mostly just experimenting, testing tool use and seeing how far I can push it.
One thing I’m not fully clear on:
For agentic workflows (Hermes or similar), do you guys usually keep “preserve thinking” on or off?
I’ve tried both, and it feels like:
- On → better reasoning sometimes, but can get stuck in loops or overthink tool calls
- Off → more direct, but occasionally dumber decisions
Not sure what the general consensus is here.
Also curious what sampling settings people are using for agents specifically. I’m trying to reduce:
- repeated / looping tool calls
- over-calling tools when it’s not needed
- getting stuck in “thinking → tool → thinking → tool” cycles
Would really appreciate if you could share:
- your go-to params (temp, top_p, repeat penalty, etc.)
- any tricks to make tool usage more stable
Still early in testing, so open to completely rethinking the setup if needed. Thanks a lot for your time and advice.