Qwen3.6 35b context - how much to set?
I am running on a MacBook M5 128GB so memory allocation isn't an issue but I'm wondering what is best set to get the most out of general agent workflows. I'm not using it for coding projects - for that I use opencode. Though I will ofc use to create small programs to aid in those workflows.
I had set llama.cpp (I'm ok with non-mlx) to the full --ctx-size 262144 with --parallel 1 but I think that's probably not best even though the model supports it..
Should I set --ctx-size 262144 with --parallel 2 or potentially even lower and set --ctx-size 131072 with --parallel 2?
Appreciate any wisdom on this front.