Qwen3.6-27B-int4-AutoRound with OpenCode has been a game changer
Last year, I built an AI rig. Glad it was last year, I would not be able to afford the price of parts this year.
I recently switched from Ollama in my docker stack to llama-swap, which opened up so many more models, and allowed for fine turning.
I experimented with several models and configurations for local coding. I'm now using OpenCode with Oh-My-OpenAgent. I setup llama-swap to load Lorbus/Qwen3.6-27B-int4-AutoRound on a pair of 3090s joined with NVLink. OpenCode and Oh-My-OpenAgent are pointed to that config for most things. It has been amazing. I'm getting about 80 tps and can maintain a 262K context. The large context is great for long coding sessions.
Anyway, thought I'd share the configuration in llama-swap, get any suggestions the hive mind might have.
"qwen3.6-27b-vllm-262k":
name: "Qwen 3.6 27B INT4 AutoRound (vLLM — NVLink Pair — 262K ctx)"
description: "Dual-3090 recipe: MTP n=3 + fp8 KV + 262K ctx + vision + tools. ~71/89 TPS"
checkEndpoint: /v1/models
ttl: 0
cmdStop: docker stop vllm-qwen36-27b-262k || true
cmd: |
docker run --rm --init
--name vllm-qwen36-27b-262k
--runtime=nvidia
--gpus '"device=1,2"'
--network ${docker-net}
--shm-size=16g
--ipc=host
-e NCCL_P2P_DISABLE=0
-e NCCL_P2P_LEVEL=NVL
-e NCCL_CUMEM_ENABLE=0
-v /mnt/models/huggingface:/root/.cache/huggingface
-v /mnt/models/vllm-cache:/root/.cache/vllm
-v /opt/ai/vllm-src:/opt/vllm-src:ro
vllm/vllm-openai:latest
--model "Lorbus/Qwen3.6-27B-int4-AutoRound"
--served-model-name "qwen3.6-27b-vllm-262k"
--quantization auto_round
--dtype float16
--tensor-parallel-size 2
--gpu-memory-utilization 0.85
--max-model-len 262144
--max-num-seqs 4
--max-num-batched-tokens 4128
--kv-cache-dtype fp8_e5m2
--enable-chunked-prefill
--enable-prefix-caching
--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--trust-remote-code
--default-chat-template-kwargs '{"enable_thinking": false}'
proxy: "http://vllm-qwen36-27b-262k:8000"