u/Easy-Ride3366

▲ 11 r/unsloth

Looping issue with MTP on Qwen3.6

Hi i have looping issue when i try the new MTP branch version of llama.cpp
My config:

[*]

chat-template-kwargs = {"preserve_thinking":true}

reasoning-budget = 4096

reasoning-budget-message = "Reasoning budget reached. Conclude the analysis and provide the final answer."

device = Vulkan1

gpu-layers = all

no-mmproj-offload = 1

batch-size = 2048

ctx-size = 128000

ubatch-size = 512

temp = 0.6

top-p = 0.95

top-k = 20

min-p = 0.00

presence-penalty=0.0

repeat-penalty=1.0

cache-prompt = 1

timeout = 600

reasoning = on

image-min-tokens = 1024

metrics = 1

fit-target = 0

no-mmap = 1

jinja = 1

prio = 3

reasoning = on

no-warmup = 1

parallel = 1

flash-attn = on

port = 8001

threads = 16

threads-batch = 16

cache-type-k = q8_0

cache-type-v = q8_0

kv-unified = true

ctx-checkpoints = 64

checkpoint-every-n-tokens = 2048

cache-ram = 20480

mlock = 1

main-gpu = 1

verbose=1

[Qwen3.6-27B-MTP-UD-Q6_K]

model = C:\Users\user\.cache\huggingface\hub\models--unsloth--Qwen3.6-27B-MTP-GGUF\snapshots\53b097416d6346f849b530e4bc1b5590dfe9d758\Qwen3.6-27B-Q6_K.gguf

mmproj = C:\Users\user\.cache\huggingface\hub\models--unsloth--Qwen3.6-27B-MTP-GGUF\snapshots\53b097416d6346f849b530e4bc1b5590dfe9d758\mmproj-BF16.gguf

cache-type-k = q4_1

cache-type-v = q4_1

spec-type = draft-mtp

spec-draft-n-max = 2

---------

i can see in terminal the LLM looping

[53923] srv update_slots: run slots completed [53923] que start_loop: waiting for new tasks [53923] que start_loop: processing new tasks [53923] que start_loop: processing task, id = 1798 [53923] que start_loop: update slots [53923] srv update_slots: posting NEXT_RESPONSE [53923] que post: new task, id = 1799, front = 0 [53923] slot get_n_draft_: id 0 | task 0 | max possible draft: 15217 [53923] slot update_batch: id 0 | task 0 | generate_draft: id=4013, #tokens=20320, #draft=1, pos_next=20320 [53923] srv update_slots: decoding batch, n_tokens = 2 [53923] set_adapters_lora: adapters = 0000000000000000 [53923] adapters_lora_are_same: adapters = 0000000000000000 [53923] set_embeddings: value = 1 [53923] slot update_slots: id 0 | task 0 | restoring speculative checkpoint (pos_min = 20319, pos_max = 20319, size = 748) [53923]

srv update_slots: run slots completed [53923] que start_loop: waiting for new tasks [53923] que start_loop: processing new tasks [53923] que start_loop: processing task, id = 1799 [53923] que start_loop: update slots [53923] srv update_slots: posting NEXT_RESPONSE [53923] que post: new task, id = 1800, front = 0 [53923] slot get_n_draft_: id 0 | task 0 | max possible draft: 15217 [53923] slot update_batch: id 0 | task 0 | generate_draft: id=4013, #tokens=20320, #draft=1, pos_next=20320 [53923] srv update_slots: decoding batch, n_tokens = 2 [53923] set_adapters_lora: adapters = 0000000000000000 [53923] adapters_lora_are_same: adapters = 0000000000000000 [53923] set_embeddings: value = 1 [53923] slot update_slots: id 0 | task 0 | restoring speculative checkpoint (pos_min = 20319, pos_max = 20319, size = 748) [53923]

srv update_slots: run slots completed [53923] que start_loop: waiting for new tasks [53923] que start_loop: processing new tasks [53923] que start_loop: processing task, id = 1800 [53923] que start_loop: update slots [53923] srv update_slots: posting NEXT_RESPONSE [53923] que post: new task, id = 1801, front = 0 [53923] slot get_n_draft_: id 0 | task 0 | max possible draft: 15217 [53923] slot update_batch: id 0 | task 0 | generate_draft: id=4013, #tokens=20320, #draft=1, pos_next=20320 [53923] srv update_slots: decoding batch, n_tokens = 2 [53923] set_adapters_lora: adapters = 0000000000000000 [53923] adapters_lora_are_same: adapters = 0000000000000000 [53923] set_embeddings: value = 1 [53923] slot update_slots: id 0 | task 0 | restoring speculative checkpoint (pos_min = 20319, pos_max = 20319, size = 748) [53923]

----

Does somebody also has this issue, better yet, does have somebody solution? This loops until timeout

reddit.com
u/Easy-Ride3366 — 6 days ago