Hi everyone!
Full disclaimer: I’m not a dev, not even a hobbyist one. I’m more of a "tinkerer" who learns by breaking things. I can barely mess around with code if I understand what’s written, but I mostly rely on AI to do the heavy lifting. I’m writing this with Gemini’s help because I’m quite confused about the technical side of local LLMs.
My Specs:
- CPU: i7-9700
- RAM: 32 GB
- GPU: RTX 3070 8GB (LHR)
The codebase is about 80k tokens. Currently, I manage everything via Google AI Studio using Gemini 3 Flash Preview. I basically tell the bot what I want to achieve, and it gives me the code. It’s a "talk to the bot -> get code -> try to see if it works in Google Apps Script" loop, and I'd like to know if moving this locally is feasible but I'm worried about my 8GB of VRAM.
- Which model is "smart" enough to understand my project and write working code without requiring me to be a senior dev to fix it?
- How can I feed 80k tokens to the AI without manually copy-pasting everything every time? I have Ollama and LM Studio installed, but I'm open to anything (IDE extensions, specific tools, etc.).
- Is there a setup that is "newbie-friendly" for someone who isn't great at reading code?
I do understand that with 8GB of VRAM I can't expect instantaneous answers, but I'd be more than happy with a decent rate: I read around that a token rate of 5-7 t/s (about human typing speed) is perfectly fine for me, as long as the model stays coherent with the 80k context.