
specs :
core i5 14400F
32gb ram d4 3200mhz
rtx 4060
current speeds
30tps in output
500 tps in prefill
command i currently use
.\llama-server.exe `
>> -m "H:\model\unsloth\Qwen3.6-35B-A3B-GGUF\Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf" `
>> --host 0.0.0.0 --port 8080 `
>> --alias "claude-sonnet-4-5" `
>> -ngl 999 `
>> --n-cpu-moe 36 `
>> -c 65535 `
>> -b 4096 `
>> -ub 2048 `
>> -t 6 `
>> -tb 10 `
>> --cont-batching `
>> --mlock `
>> -ctk turbo4 -ctv turbo3 `
>> -fa on `
>> --jinja `
>> --warmup `
>> --perf `
current usage