u/Basic-Association880

i tried many verison of llm, some like crack version, qwen3.5-claude, qwen3.6 configl, 26B 27B model need prefill for 5mins in omlx, and get tools loop in fix issue, trying hindsight using llm with hermes-llama3.1-8b , keep using all the system memory and stuck and make omlx crash , i wanna run 2 of llm for main agent+ hindsight, is the A3B model of my only choice? also i have set memory limit in omlx when hope to use two model but j will not work when hindsight processing the memory with hermes 8b model. hope can get help. two of the model i also using MLX-FP16 version already.

reddit.com
u/Basic-Association880 — 11 days ago