u/MrAddams_LibraLogic

If you run multiple models in the same session, be it a coding LLM, a reasoning LLM, different ComfyUI checkpoints depending on what you're generating, you already know the problem. Every swap loads gigabytes off disk. Fast NVMe makes it bearable. SATA or spinning rust makes it genuinely painful. And Windows will evict those file cache pages whenever something else needs memory, so you can't count on the OS keeping them warm for you.

I wrote a Windows app called EWE (Extended Weights Exchanger) that addresses this directly. You add your models to a "warm map," set a RAM budget, and EWE pins the weights using Windows memory APIs so they can't be evicted. The next time any application loads that model, it reads from RAM instead of going back to disk. On my setup, swaps that were taking 60-90 seconds now take under 5 seconds.

https://preview.redd.it/q6t7o1nqr42h1.png?width=900&format=png&auto=webp&s=bf4eae93cbb1254fb759a28410db9004d2b4d691

It's not magic - you need enough system RAM to hold what you want to keep warm. But if you have spare RAM sitting idle while you work, this is a pretty direct use for it.

The app is at https://accord-gpu.com/ewe/ if you want to look at what it does. Currently collecting free early access accounts and enrollments for beta access to the products I'm building. EWE is going to be a one-time purchase (no subscription), and I want to get real users on it before setting the price.

A few things I'm genuinely curious about from this community:

I wrote this for Ollama and ComfyUI specifically on my box. It reads the Ollama blob manifests and loads .gguf, .safetensors, .ckpt and .pth files so far. What other model formats should it support, and what other applications should I be checking against for compatibility?
Is this a workflow pain you actually have, or do most people just absorb the downtime between model uses?
Is there an obvious feature I'm missing?
What would a fair one-time price look like for something like this for a perpetual license?

Honest feedback is more useful than encouragement here. If this solves a problem you don't actually have I'd rather know now.

I built a Windows app that pins your model weights in RAM so you stop waiting for disk loads on every model swap - looking for feedback