Apologies for the 1345th post glazing QWEN but it has literally been a game-changer for me.
I’m relatively new to running local LLMs, but the appeal of having a private AI assistant for coding and experimentation is significant.
As a software engineer, I’ve also embraced the recent “vibe-coding” trend, mostly using AI to build hobby Android apps in my spare time. Over the past month, I created a web scraper app designed to extract images from ad-heavy websites that are otherwise frustrating to navigate.
The project was entirely AI-generated. I didn’t write the code myself, though I did design the architecture and suggest optimizations, which AI still struggles with. I’ll admit I haven’t reviewed most of the code; it’s messy, but since it’s purely a personal hobby project, that’s acceptable.
Until recently, I relied on services like Antigravity and Codex, which worked well enough on free plans. However, tighter usage quotas pushed me toward local models.
My hardware is modest by local LLM standards: an AMD 7700 XT, 32GB DDR4 RAM, and a Ryzen 5 5600. I experimented with Gemma 3, Gemma 4, and Qwen 2.5 Coder (mostly Q3/Q4 quantizations under 20GB due to VRAM constraints), using LM Studio as the backend with various frontends like GitHub Copilot, RooCode, Cline, Android Studio AI Chat, and OpenCode.
Unfortunately, none of these models fit my workflow well. They struggled with even minor bug fixes, frequently exhausted context windows, got stuck in reasoning loops, or failed tool calls repeatedly.
Then I tried Qwen 3.6 35B-A3B.
Initially, I expected little, but I installed the i1-q4_k_s quant, offloaded all 40 layers to GPU, configured 128k context, enabled flash attention, and used Q8_0 KV quantization.
For testing, I gave it a practical task: fix the scraper logic for a problematic website. Gemini Flash (in Antigravity) and MinMax (the free version in Opencode) had both failed to solve this issue despite multiple attempts. Using LM Studio as an OpenAI-compatible endpoint with GitHub Copilot in VS Code, I let it run.
It took about 25 minutes, but it succeeded in one shot.
With a single initial prompt, it analyzed the site’s HTML structure, compared it against my Kotlin scraper code, and resolved three critical bugs without a single failed tool call.
That result was impressive enough that I gave it a second challenge: update the project README with real app screenshots by driving an Android emulator, selecting screenshots based on specific criteria.
Using RooCode this time, the process took around 45 minutes. I had to teach it some emulator workflow conventions, such as taking screenshots after every action and analyzing them to track app state, but once instructed, it executed flawlessly.
For the first time, I feel like I have a local model capable of reliably handling most of my coding tasks, while reserving cloud-based premium models for more demanding work.
Qwen has genuinely made local AI coding practical for me.