u/Either_Audience_1937

Hi everyone, I’m looking to pick up a used MacBook for running local LLMs (Ollama, LM Studio, etc.) My budget is around $1000, and I’ve found two main options at this price point:

M1 Max (10-core CPU, 24/32-core GPU) with 32GB Unified Memory.
M2 Pro (12-core CPU, 19-core GPU) with 32GB Unified Memory.

My primary use case is daily coding assistance and experimenting with models like DeepSeek-Coder, Qwen 2.5, and Llama 3

My main concern is tokens per second (t/s). I know the M1 Max has 400 GB/s memory bandwidth, while the M2 Pro is limited to 200 GB/s. Does this bandwidth difference significantly impact inference speed for 7B - 14B models in 4-bit or 8-bit quantization?

Is the M1 Max still the "king" of value here, or does the newer architecture/CPU of the M2 Pro offer any hidden benefits for LLM workflows?

Thanks!

M1 Max 32GB vs M2 Pro 32GB for Local LLM Inference