u/Crystallover1991

For years I approached personal development in the following way: write down a list of 10 things that need to be improved about yourself and burnout after two weeks since I was changing my whole life at once.

This time I chose to work only on one thing: falling asleep at the same time each night. That's all. No new diets. No exercising. No journaling.

Just that one simple thing for three months straight.

But some how the very act of changing my sleeping habits made me change other things without even trying. Now I am not as irritated as before. I make better decisions. I do not pick up my phone at the first sign of boredom.

Not saying that sleep is magic. Just saying that I've finally realized that trying to change everything at once is nothing but an elegant form of changing nothing.

Been thinking about the probabilistic foundations of the current ML meta and it feels kinda... backwards? we have this massive industry-wide fixation on autoregressive models right now, where we're just hammering conditional probabilities P(x_t | x_<t) to death

But mathematically, if you want to capture the actual underlying distribution of complex, structured data, building a joint probability model makes way more sense. i was going over some literature on EBMs recently and it just reminded me how elegant it is to model the unnormalized density directly. you define a scalar energy function, and lower energy simply equals higher probability. it maps so beautifully to actual statistical mechanics and thermodynamics

Obviously the partition function is a nightmare to compute in practice, and MCMC sampling is notoriously painful to scale compared to just running a simple forward pass in a transformer. but it honestly feels like we just threw our hands up and accepted greedy left-to-right sampling purely because its easier to parallelize on current GPU architectures. statistically speaking, it's such a brittle way to model global structure.

is anyone here actually doing research or applied stats with non-autoregressive probabilistic models lately? or did the whole field just permanently capitulate to the genAI hype?

Stopped trying to "fix" everything at once and just focused on one thing. Three months later I actually feel different.

[D] feels like we abandoned proper joint probability modeling just because next-token prediction is easier to compute