u/Georgiou1226

▲ 136 r/HPC+1 crossposts

Hey all, I'm a data scientist by background, not an HPC sysadmin. I recently got a research allocation on MareNostrum V to run 50 OpenFOAM CFD simulations for an aerodynamics ML pipeline and wrote up the experience for people making the same transition.

The things that got me: the airgap is obvious in theory but the first time a job dies at 2am because of a missing library it hits differently. Also the bottleneck ended up being egress, not compute: pulling output tensors back over scp took longer than the actual simulations. And I wasted a bunch of time throwing too many cores at CFD cases before Amdahl's Law became very real very fast.

Full writeup with actual job scripts here if anyone's curious: https://towardsdatascience.com/what-it-actually-takes-to-run-code-on-200me-supercomputer/

Happy to answer questions from others coming from AWS/cloud who are figuring out the transition.

u/Georgiou1226 — 17 days ago