u/Ariquitaun

Apologies if this has been asked before; I couldn't find anything when searching. I'm currently working with Qwen 3.6 35B for local development on a resource-constrained system. I'm finding that Qwen's thinking blocks are often so incredibly verbose that they exceed the max_completion_tokens limit (currently set to 8192 tokens out of desperation), causing some requests to fail. I'm currently running the model via LM Studio. Is there anything that can be done to encourage it to think less?

u/Ariquitaun — 17 days ago