
SoftMig – software GPU slicing for SLURM (no hardware MIG needed, works on any CUDA 12+ GPU)
We built this at the University of Alberta because we had a pile of L40S, A40, and other GPUs that SLURM couldn't meaningfully slice. Hardware MIG only covers a handful of models, requires draining nodes to reconfigure, and locks you into rigid layouts. Result: full 48GB cards going out for jobs that needed 12GB. Classic HPC waste.
SoftMig is a SLURM-native software slicing layer — a fork of HAMi-core adapted for cluster environments. It enforces per-job memory ceilings and compute throttling via LD_PRELOAD, with prolog/epilog hooks handling the job lifecycle. Works on any CUDA 12+ GPU.
A 48GB L40S becomes:
- 1 full GPU
- 2 × 24GB half-slices
- 4 × 12GB quarter-slices
- ...or whatever layout your site defines
Change layouts through SLURM policy. No node drain, no reboot.
A few things it does that hardware MIG can't:
- Mix slice sizes on the same GPU (e.g. a half + two quarters on one card)
- No lost capacity — hardware MIG burns memory to its own infrastructure; SoftMig slices the full pool
- Compute is sliced too, not just memory — SM access is throttled proportionally per job
Heads up on build/install: The docs are written for Digital Research Alliance of Canada / Compute Canada cluster environments, so if you're deploying elsewhere you may need to adapt things. Claude Code or Cursor work well for navigating the compilation and integration steps if you're not in that ecosystem.
MIT licensed. GitHub: https://github.com/ualberta-rcg/softmig
Happy to answer questions — we've been running v1 in production on Vulcan and v2 is now in testing.