NFL WR Rookie Model - Looking for Feedback/Critique
I’ve been working on a Python/XGBoost model that tries to project which rookie wide receivers are most likely to produce at least one top-12 fantasy WR season within their first four NFL seasons. The goal is not to create a perfect ranking system, but to build a structured prospect model that combines NFL draft capital, college production, PFF receiving data, athletic testing, competition/context adjustments, and historical fantasy outcomes.
The model trains on historical drafted WR classes from 2014 onward, using NFL fantasy production to label whether each player eventually produced a top-12 WR season within his first four years. I also added top-24/top-36 outcome tracking and season-count columns, so the sheet can separate “ever hit top-12” from “how many top-12/top-24/top-36 seasons did this player actually produce.”
Data used
The model currently uses:
- NFL draft data: draft year, round/pick, team, college, player IDs
- NFL weekly fantasy production: used to calculate WR season ranks and first-four-year outcomes
- PFF college WR production: yards, receptions, targets, touchdowns, routes, YPRR, receiving grades, route grades, drop rate, yards after catch, target share, dominator-style metrics, etc.
- PFF context splits: man/zone performance, slot/screen/concept usage, and receiving production by depth of target
- Combine/athletic data: height, weight, forty, vertical, broad jump, cone, shuttle, bench, derived speed/burst/size metrics
- Competition/context adjustments: conference/team strength, competition-adjusted production, low-competition risk, screen dependency, downfield production, and trajectory flags
- Historical validation: leave-one-draft-year-out backtesting and walk-forward testing
Main target
The primary label is:
Did this WR produce at least one top-12 fantasy WR season within his first four NFL seasons?
I also track:
- top24_first4
- top36_first4
- top12_seasons_first4
- top24_seasons_first4
- top36_seasons_first4
- best WR rank in first four seasons
- best PPR season in first four seasons
Model approach
The main model is an XGBoost classifier. It uses a reduced/cleaned feature set to avoid overloading the model with too many duplicate raw and competition-adjusted versions of the same stat. The model also uses scale_pos_weight because top-12 WR hits are relatively rare.
I run two main validation views:
- Leave-one-year-out backtest The model trains on every completed draft class except the test year, then scores that held-out class.
- Walk-forward historical test The model scores each class using only information from prior classes. For example, the 2023 class is scored using training data through 2022. This is meant to test whether the model would have identified players like Puka Nacua before knowing their NFL breakout.
In the latest walk-forward test, Puka was flagged as a “late-pick outlier profile — model likes more than NFL did,” which is exactly the type of signal I wanted the model to surface. It does not mean the model viewed him as safe; it means his underlying receiving/context profile was stronger than a typical late-round WR.
Final rookie score
The final rookie score is a blended score. It is not just the XGBoost probability. It combines:
- model probability of a top-12 outcome
- prospect grade
- contextual production profile
- calibration/priors
- draft-capital-vs-model interpretation flags
I removed the extra standalone draft-capital weight from the final score because draft capital is already represented inside both the model and prospect grade. Draft capital is still included as a feature and displayed in the output, but I wanted to avoid triple-counting it.
Recent model checks
The most recent run used 295 eligible training rows and 34 top-12 hits. The leave-one-year-out backtest produced an overall average precision around 0.70 and ROC AUC around 0.89. The model is not perfect, but the historical validation has been useful for identifying where the model is too aggressive, too draft-capital-heavy, or too reliant on certain production signals.
2023 walk-forward example
The walk-forward test for 2023 ranked the class like this:
| Rank | Player | Pick | Walk-forward score | Walk-forward probability | Actual best WR rank so far | Interpretation |
|---|---|---|---|---|---|---|
| 1 | Jaxon Smith-Njigba | 20 | 73.11 | 0.729 | 2 | Model and draft capital agree |
| 2 | Puka Nacua | 177 | 44.44 | 0.507 | 1 | Late-pick outlier profile |
| 3 | Jordan Addison | 23 | 43.28 | 0.006 | 24 | Model and draft capital agree |
| 4 | Zay Flowers | 22 | 40.07 | 0.012 | 8 | Model and draft capital agree |
| 5 | Quentin Johnston | 21 | 38.94 | 0.055 | 37 | Model and draft capital agree |
This was encouraging because Puka was not being rewarded because of hindsight. The model trained only through 2022 and still flagged him as a late-pick profile that was stronger than his draft capital.
Current 2026 model output
Here are the current 2026 WR rankings by final_rookie_score:
| Rank | Player | School | Draft pick | Final rookie score | Top-12 probability | Prospect grade |
|---|---|---|---|---|---|---|
| 1 | Jordyn Tyson | Arizona St. | 8 | 54.52812 | 0.24 | 79.50568 |
| 2 | Carnell Tate | Ohio St. | 4 | 53.14501 | 0.411294 | 62.97594 |
| 3 | Makai Lemon | USC | 20 | 52.00084 | 0.18 | 79.81971 |
| 4 | KC Concepcion | Texas A&M | 24 | 46.22785 | 0.14 | 72.59609 |
| 5 | Elijah Sarratt | Indiana | 115 | 43.31409 | 0.169116 | 64.91615 |
| 6 | Omar Cooper | Indiana | 30 | 42.6084 | 0.14 | 66.01527 |
| 7 | Denzel Boston | Washington | 39 | 40.38833 | 0.09 | 66.06969 |
| 8 | De'Zhaun Stribling | Mississippi | 33 | 36.30474 | 0.09 | 58.64498 |
| 9 | Germie Bernard | Alabama | 47 | 34.97299 | 0.09 | 56.22361 |
| 10 | Chris Brazzell II | Tennessee | 83 | 34.83543 | 0.04 | 60.06442 |
What I’m looking for feedback on
I’m especially looking for critiques on:
- Whether top-12 within first four seasons is the right primary label
- Whether top-24/top-36 or best WR rank should be modeled separately
- Whether the competition adjustment is too aggressive or not aggressive enough
- Whether draft capital is still being over-weighted
- Whether the model is overfitting to certain PFF production/context metrics
- Whether late-pick outlier profiles like Puka should be handled differently
- Whether the final rookie score should be more probability-based or more prospect-grade-based
- Whether I should add another model type for comparison, such as logistic regression, random forest, or a regression model for best WR rank
I know this is not perfect, and I’m not trying to claim it is predictive gospel. I’m mainly trying to build a transparent, testable framework for WR prospect analysis and would appreciate any feedback on the methodology, assumptions, feature engineering, and validation approach.