u/schnarfdogg

I’ve been working on a Python/XGBoost model that tries to project which rookie wide receivers are most likely to produce at least one top-12 fantasy WR season within their first four NFL seasons. The goal is not to create a perfect ranking system, but to build a structured prospect model that combines NFL draft capital, college production, PFF receiving data, athletic testing, competition/context adjustments, and historical fantasy outcomes.

The model trains on historical drafted WR classes from 2014 onward, using NFL fantasy production to label whether each player eventually produced a top-12 WR season within his first four years. I also added top-24/top-36 outcome tracking and season-count columns, so the sheet can separate “ever hit top-12” from “how many top-12/top-24/top-36 seasons did this player actually produce.”

Data used

The model currently uses:

NFL draft data: draft year, round/pick, team, college, player IDs
NFL weekly fantasy production: used to calculate WR season ranks and first-four-year outcomes
PFF college WR production: yards, receptions, targets, touchdowns, routes, YPRR, receiving grades, route grades, drop rate, yards after catch, target share, dominator-style metrics, etc.
PFF context splits: man/zone performance, slot/screen/concept usage, and receiving production by depth of target
Combine/athletic data: height, weight, forty, vertical, broad jump, cone, shuttle, bench, derived speed/burst/size metrics
Competition/context adjustments: conference/team strength, competition-adjusted production, low-competition risk, screen dependency, downfield production, and trajectory flags
Historical validation: leave-one-draft-year-out backtesting and walk-forward testing

Main target

The primary label is:

Did this WR produce at least one top-12 fantasy WR season within his first four NFL seasons?

I also track:

top24_first4
top36_first4
top12_seasons_first4
top24_seasons_first4
top36_seasons_first4
best WR rank in first four seasons
best PPR season in first four seasons

Model approach

The main model is an XGBoost classifier. It uses a reduced/cleaned feature set to avoid overloading the model with too many duplicate raw and competition-adjusted versions of the same stat. The model also uses scale_pos_weight because top-12 WR hits are relatively rare.

I run two main validation views:

Leave-one-year-out backtest The model trains on every completed draft class except the test year, then scores that held-out class.
Walk-forward historical test The model scores each class using only information from prior classes. For example, the 2023 class is scored using training data through 2022. This is meant to test whether the model would have identified players like Puka Nacua before knowing their NFL breakout.

In the latest walk-forward test, Puka was flagged as a “late-pick outlier profile — model likes more than NFL did,” which is exactly the type of signal I wanted the model to surface. It does not mean the model viewed him as safe; it means his underlying receiving/context profile was stronger than a typical late-round WR.

Final rookie score

The final rookie score is a blended score. It is not just the XGBoost probability. It combines:

model probability of a top-12 outcome
prospect grade
contextual production profile
calibration/priors
draft-capital-vs-model interpretation flags

I removed the extra standalone draft-capital weight from the final score because draft capital is already represented inside both the model and prospect grade. Draft capital is still included as a feature and displayed in the output, but I wanted to avoid triple-counting it.

Recent model checks

The most recent run used 295 eligible training rows and 34 top-12 hits. The leave-one-year-out backtest produced an overall average precision around 0.70 and ROC AUC around 0.89. The model is not perfect, but the historical validation has been useful for identifying where the model is too aggressive, too draft-capital-heavy, or too reliant on certain production signals.

2023 walk-forward example

The walk-forward test for 2023 ranked the class like this:

Rank	Player	Pick	Walk-forward score	Walk-forward probability	Actual best WR rank so far	Interpretation
1	Jaxon Smith-Njigba	20	73.11	0.729	2	Model and draft capital agree
2	Puka Nacua	177	44.44	0.507	1	Late-pick outlier profile
3	Jordan Addison	23	43.28	0.006	24	Model and draft capital agree
4	Zay Flowers	22	40.07	0.012	8	Model and draft capital agree
5	Quentin Johnston	21	38.94	0.055	37	Model and draft capital agree

This was encouraging because Puka was not being rewarded because of hindsight. The model trained only through 2022 and still flagged him as a late-pick profile that was stronger than his draft capital.

Current 2026 model output

Here are the current 2026 WR rankings by final_rookie_score:

Rank	Player	School	Draft pick	Final rookie score	Top-12 probability	Prospect grade
1	Jordyn Tyson	Arizona St.	8	54.52812	0.24	79.50568
2	Carnell Tate	Ohio St.	4	53.14501	0.411294	62.97594
3	Makai Lemon	USC	20	52.00084	0.18	79.81971
4	KC Concepcion	Texas A&M	24	46.22785	0.14	72.59609
5	Elijah Sarratt	Indiana	115	43.31409	0.169116	64.91615
6	Omar Cooper	Indiana	30	42.6084	0.14	66.01527
7	Denzel Boston	Washington	39	40.38833	0.09	66.06969
8	De'Zhaun Stribling	Mississippi	33	36.30474	0.09	58.64498
9	Germie Bernard	Alabama	47	34.97299	0.09	56.22361
10	Chris Brazzell II	Tennessee	83	34.83543	0.04	60.06442

What I’m looking for feedback on

I’m especially looking for critiques on:

Whether top-12 within first four seasons is the right primary label
Whether top-24/top-36 or best WR rank should be modeled separately
Whether the competition adjustment is too aggressive or not aggressive enough
Whether draft capital is still being over-weighted
Whether the model is overfitting to certain PFF production/context metrics
Whether late-pick outlier profiles like Puka should be handled differently
Whether the final rookie score should be more probability-based or more prospect-grade-based
Whether I should add another model type for comparison, such as logistic regression, random forest, or a regression model for best WR rank

I know this is not perfect, and I’m not trying to claim it is predictive gospel. I’m mainly trying to build a transparent, testable framework for WR prospect analysis and would appreciate any feedback on the methodology, assumptions, feature engineering, and validation approach.

NFL WR Rookie Model - Looking for Feedback/Critique