u/FiltroMan

Hi everyone!

Full disclaimer: I’m not a dev, not even a hobbyist one. I’m more of a "tinkerer" who learns by breaking things. I can barely mess around with code if I understand what’s written, but I mostly rely on AI to do the heavy lifting. I’m writing this with Gemini’s help because I’m quite confused about the technical side of local LLMs.

My Specs:

  • CPU: i7-9700
  • RAM: 32 GB
  • GPU: RTX 3070 8GB (LHR)

The codebase is about 80k tokens. Currently, I manage everything via Google AI Studio using Gemini 3 Flash Preview. I basically tell the bot what I want to achieve, and it gives me the code. It’s a "talk to the bot -> get code -> try to see if it works in Google Apps Script" loop, and I'd like to know if moving this locally is feasible but I'm worried about my 8GB of VRAM.

  1. Which model is "smart" enough to understand my project and write working code without requiring me to be a senior dev to fix it?
  2. How can I feed 80k tokens to the AI without manually copy-pasting everything every time? I have Ollama and LM Studio installed, but I'm open to anything (IDE extensions, specific tools, etc.).
  3. Is there a setup that is "newbie-friendly" for someone who isn't great at reading code?

I do understand that with 8GB of VRAM I can't expect instantaneous answers, but I'd be more than happy with a decent rate: I read around that a token rate of 5-7 t/s (about human typing speed) is perfectly fine for me, as long as the model stays coherent with the 80k context.

reddit.com
u/FiltroMan — 7 days ago