newb question, keeping models in ram(as storage) instead of fully flushing
so i am a basic ollama user, i just install the app, use open webui and thats it. my question is this; i am thinking of using deepseek r1 as planning model and qwen 3.6 35b unsloth one for coding in Cline in Vscode. since i have just a 5090 and 128gb system ram, instead of constantly offloading the model fully and reading from ssd, i though maybe i can use my ram as the storage, keep the models there, and load/unload models from/to ram instead?
I am not asking to use ram instead of vram. that is not what i am asking (which is also done automatically by ollama) i am just asking would it be possible to make ollama keep the UNUSED model in ram instead and how much speed would it give to me compared to an nvme ssd with 3-4GB/s read speed? are we talking about a few seconds which can be ignored or would it matteR?