u/FroyoEducational4851 — reddlx

▲ 2 r/LocalLLM+1 crossposts

I don’t mind the model taking time to respond, but seeing the whole thinking/reasoning process on screen gets distracting really fast.

Is there a clean way to hide it while still letting the model think normally in the background?

reddit.com

u/FroyoEducational4851 — 8 days ago

▲ 2 r/ollama

I’ve been testing local models for retrieval augmented generation, document querying, and structured outputs, but I keep running into tradeoffs between reasoning, context handling, schema reliability, and hardware efficiency.

So far I’ve tried Gemma, Minimax, and a bit of Command-R, and I’m now looking into Qwen and LFM2. Gemma felt solid overall, but schema outputs became inconsistent under heavier workloads. Minimax felt weaker than I expected, though that might’ve been my setup.

Curious what models people are actually sticking with for serious local workflows.

reddit.com

u/FroyoEducational4851 — 9 days ago

▲ 16 r/LocalLLM

I’m trying to run a local setup for retrieval augmented generation and some machine learning work.

Curious what models people are actually using right now and how they’re performing.

reddit.com

u/FroyoEducational4851 — 9 days ago

▲ 0 r/ollama

Tried a longer run with Ollama and got:

Model: Qwen3-Coder 30B
~40k tokens in ~14 min
~48 tok/sec (pretty stable)

System:

RAM ~23GB (almost full)
Swap ~1.5–2GB
CPU ~200%+, GPU ~70%

Feels solid, but not sure if this is expected or if I’m hitting the ceiling.

Anyone getting better numbers on similar hardware? Any Ollama tweaks worth trying?

reddit.com

u/FroyoEducational4851 — 12 days ago