u/chimph

I am running on a MacBook M5 128GB so memory allocation isn't an issue but I'm wondering what is best set to get the most out of general agent workflows. I'm not using it for coding projects - for that I use opencode. Though I will ofc use to create small programs to aid in those workflows.

I had set llama.cpp (I'm ok with non-mlx) to the full --ctx-size 262144 with --parallel 1 but I think that's probably not best even though the model supports it..

Should I set --ctx-size 262144 with --parallel 2 or potentially even lower and set --ctx-size 131072 with --parallel 2?

Appreciate any wisdom on this front.

Perhaps I'm not understanding what native tool calling setting actually does but I have searxng added as mcp, which successfully searches the web when I choose it but I was hoping the model would choose to do it by itself if it needs to when I've enabled native tool calling.

The top result is without selecting the web search toggle (in hopes it would still use it) and the bottom one I did select the web search which is my usual way of forcing it to use web.

https://preview.redd.it/1ur1oq4uboxg1.png?width=1130&format=png&auto=webp&s=8ec90a00e253f001f395b8a0489f4038b445b195

I have this model being served via lmstudio server with full 256k context. Not sure if I have to enable anything else?

https://preview.redd.it/nh1hdgz9doxg1.png?width=348&format=png&auto=webp&s=8f3375b565eabd3b4f28bf0d4d329f7c36dc4a52

Qwen3.6 35b context - how much to set?

How to have the model (gemma4) search the web via native tool calling without specifically choosing web search toggle within the chat?