u/Any-Scientist-7612

Looking for real-world advice from people actually using local LLMs daily for knowledge work / executive thinking workflows.

Use case is personal (single-user only), not hosting or serving others.

What I want to build:

- Personal AI advisor / assistant

- Obsidian-integrated RAG

- Book + PDF repository (in external disk)

- Long-term memory / contextual assistant

- Agentic AI experimentation (hands-on learning)

- Strategic thinking, management consulting-style analysis, writing, synthesis

- Privacy-first local setup

- Picking up coding again (to build application as hobby)

Current shortlist:

- Mac Studio M3 Ultra 28-core CPU / 60-core GPU / 96GB RAM

vs

- Mac Studio M3 Ultra 32-core CPU / 80-core GPU / 96GB RAM

Planned models:

- Qwen 70B mainly

- likely Q5_K_M quant (maybe Q4_K_M initially)

A few questions for people actually running similar setups:

  1. Is 96GB realistically enough for Q5 70B + RAG + agent workflows for the next few years, assuming mostly one active model at a time?

  2. Does the jump from 60-core GPU to 80-core GPU materially change the experience in real life, or mostly benchmark numbers?

  3. For nuanced writing / emotionally aware outputs / consulting-style reasoning:

- how noticeable is the jump from Q4_K_M to Q5_K_M?

- does Q5 feel meaningfully more “human” or coherent over long sessions?

  1. If you also use paid ChatGPT / Claude:

- where does local Qwen 70B Q5 still noticeably fall short?

- where does local actually feel better once RAG/personal memory is integrated?

  1. Any regrets going Mac Studio instead of NVIDIA/CUDA workstation for this type of workflow?

Not looking for benchmark flex or homelab setups — more interested in lived experience from people using local AI as a daily thinking companion / knowledge system.

Thank you again.

reddit.com
u/Any-Scientist-7612 — 7 days ago