u/__sharpsresearch__ — reddlx

A lot of people talk about ML like there is one correct way to build a model.

Everyone treats logloss, Brier, backtesting as God. This is my attempt to challenge it a bit.

These Substacks are a free read, I have no intention of making money from them, touting or grifting. I write them mostly for myself as I find it helps me think. But selfishly I learn a lot writing them and when people push back respectfully, not to mention the connections/dm's I get from these.

Thought some of you enjoy.

https://open.substack.com/pub/thequantativegambler/p/the-art-of-player-strength-models

High level I think I'm trying to communicate that two models can have the same MAE, R², etc. whatever you judge them on, but be completely different models in practice depending on communication/other objectives

I’ve been working on a sports Elo variant I call Rolling Reset Elo.

Basic argument: classic Elo is good for some things. Not team sports.

Classic Elo has infinite memory. Every game ever played still contributes to the current rating. That makes sense for chess, where you are tracking one person over a long period of time. It breaks down when you are tracking NBA teams where rosters, coaches, injuries, roles, and usage patterns change constantly.

Most public sports Elo systems solve this with some version of regression to the mean. I think that is mostly BS. You drag every team back toward 1500 on a calendar schedule and call it uncertainty. But uncertainty does not show up once a year on the same day for every team. It shows up after trades, injuries, coaching changes, and teams randomly breaking.

A 'Rolling Reset Elo' fixes it structurally.

For each target date, define a lookback window. Reset every team to the same baseline. Replay only the games inside that window. Store the ratings as the pregame feature for that date. Then move the window forward and do it again.

No seasonal regression hack. No stale franchise history. No hidden computed state.

The bigger payoff is running multiple windows at the same time: elo_30, elo_65, elo_365, etc. The ratios between them become features. If short-term Elo is ripping above long-term Elo, something changed. If it collapses below, something broke.

substack link to detailed post

u/sharpsresearch

There is "art in machine learning"