Looking for real-world advice from people actually using local LLMs daily for knowledge work / executive thinking workflows.
Use case is personal (single-user only), not hosting or serving others.
What I want to build:
- Personal AI advisor / assistant
- Obsidian-integrated RAG
- Book + PDF repository (in external disk)
- Long-term memory / contextual assistant
- Agentic AI experimentation (hands-on learning)
- Strategic thinking, management consulting-style analysis, writing, synthesis
- Privacy-first local setup
- Picking up coding again (to build application as hobby)
Current shortlist:
- Mac Studio M3 Ultra 28-core CPU / 60-core GPU / 96GB RAM
vs
- Mac Studio M3 Ultra 32-core CPU / 80-core GPU / 96GB RAM
Planned models:
- Qwen 70B mainly
- likely Q5_K_M quant (maybe Q4_K_M initially)
A few questions for people actually running similar setups:
Is 96GB realistically enough for Q5 70B + RAG + agent workflows for the next few years, assuming mostly one active model at a time?
Does the jump from 60-core GPU to 80-core GPU materially change the experience in real life, or mostly benchmark numbers?
For nuanced writing / emotionally aware outputs / consulting-style reasoning:
- how noticeable is the jump from Q4_K_M to Q5_K_M?
- does Q5 feel meaningfully more “human” or coherent over long sessions?
- If you also use paid ChatGPT / Claude:
- where does local Qwen 70B Q5 still noticeably fall short?
- where does local actually feel better once RAG/personal memory is integrated?
- Any regrets going Mac Studio instead of NVIDIA/CUDA workstation for this type of workflow?
Not looking for benchmark flex or homelab setups — more interested in lived experience from people using local AI as a daily thinking companion / knowledge system.
Thank you again.