u/Quirky-Guide-762 — reddlx

I'm starting a Master's focused around AI systems, GPU computing, and HPC, and I'm trying to better understand where the genuinely important problems are in AI infrastructure and inference engineering.

My background so far has mostly been applied ML systems work:

production LLM serving with vLLM
real-time ASR → LLM → TTS pipelines
LoRA fine-tuning/merging
latency-sensitive voice agents
benchmarking and pipeline optimization

Over time I realized many of the hardest problems weren't really model problems, but systems problems:

GPU underutilization
irregular batching
memory movement
scheduling/backpressure
latency propagation through pipelines
inference efficiency under real-time constraints

Long term, I’m much more interested in the systems/infrastructure side of AI than pure modeling. Things like:

inference runtimes
GPU systems
CUDA/Triton
compiler/runtime optimization
distributed inference
memory efficiency
scheduling for irregular workloads
HPC for AI workloads
AI-assisted systems optimization

Right now I’m trying to figure out what problems in this space are:

genuinely important
underexplored
likely to matter over the next 5 years
realistic for a strong Master’s thesis

A few questions I’d really love practitioner/researcher opinions on:

What problems in AI infrastructure or inference engineering still feel painfully unsolved in practice?
What research directions seem overhyped vs genuinely valuable?
Which areas do you think will matter most over the next few years: kernel optimization, distributed inference, memory systems, compiler/runtime work, scheduling, networking, etc?
What’s a technically deep but realistic Master’s-level research problem in this space?

Happy to hear “your framing is wrong because X” too — that’s probably the most useful feedback I can get right now.