u/Clear-Ask6409

Best local LLM for long‑form RP with complex plot and 120–150k context

Hi everyone!

About a year ago I discovered Silly Tavern. Back then it wasn’t too hard to find a free proxy for Gemini Pro, but now it’s a real pain. I think it’s time for me to dive into local LLMs – I want a calm, stable RP experience without constantly hunting for API keys on random forums.

My hardware:

- RTX 4070 Ti Super (16 GB VRAM)

- Ryzen 5 9600X

- 64 GB DDR5 (6000 MHz)

I know this isn’t ideal for serious models, so I’d really appreciate hearing about real‑world experiences from other people.

The main issue:

My lorebook is ~25k tokens, plus a ~3k character card. Even after brutally trimming everything non‑essential, I’ll still be left with ~18–20k (lorebook) + ~2.1k (character + first message).

I’m looking for a model that can comfortably handle 120–150k context on my hardware without degradation. Why so much? Because I play very long storylines spanning multiple “chats”. Each previous chat gets summarised, and that summary replaces the first message in the next chat. This way the whole story continues for 1.2–1.5 million tokens on average.

Any recommendations? Which models would you suggest for such a large context and complex plots? How well do they perform on 16GB VRAM + 64GB system RAM? I’m open to quantized versions, offloading, or any tricks you’ve found useful.

Thanks a lot!

reddit.com
u/Clear-Ask6409 — 24 hours ago