
Cooked up a new Qwen3-8B coding model that actually "thinks" before it types (HyperThinkCode-v1.5)
Hey everyone!
I just dropped a new 4-bit QLoRA fine-tune based on Qwen3-8B under my org, Cyprus. If you're into models that map out their logic before just blindly spitting out scripts, you might want to give this a spin. It's called HyperThinkCode-Qwen3-8B-v1.
Model Link:https://huggingface.co/Andy-ML-And-AI/HyperThinkCode-Qwen3-8B-v1
The Vibe: "Think first, code second"
The main goal here was to force the model to explicitly reason before writing the final code. I used a 30k subset of the Sashvat/HyperThink-X-Nvidia-Opencode-Reasoning-200K dataset and tweaked the chat template so the assistant responds inside a thinking field first. Basically, it talks to itself to figure out the problem, then it gives you the code.
How I cooked it up:
- Base: Qwen3-8B
- Hardware: Trained on dual Tesla T4s (16GB VRAM each)
- The Method: 4-bit QLoRA via Unsloth. Targeted all linear layers (Attention: q, k, v, o | MLP: gate, up, down) with Rank 16 / Alpha 16.
- Time: Super quick run—just 50 steps (global batch size 8), which took about 1 hour and 17 minutes.
- Context: Capped at 4096 tokens to balance code complexity without letting VRAM explode.
Even with just 50 steps, the training loss dropped nicely (0.8177 down to 0.6785). I'm currently running lm-eval benchmarks on HumanEval and GSM8K to see exactly how it stacks up against the base Qwen3-8B.
Running it
Since it’s an 8B, it’s super lightweight and easy to daily-drive. If you want to fire it up in Python using Unsloth, here is the quick snippet:
Python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Andy-ML-And-AI/HyperThinkCode-Qwen3-8B-v1",
max_seq_length = 4096,
load_in_4bit = True,
)
I'd love for you guys to test it out against whatever local coding models you're currently using and let me know if the extra "hyperthinking" layer actually helps with your workflows!