u/Big-Sandwich6046

I trained XGBoost on 461K CSP + covered call trades across 28 tickers to see what a model learns about wheel strike selection — here's what it found (and what the 30-delta rule already gets right)

I trained XGBoost on 461K CSP + covered call trades across 28 tickers to see what a model learns about wheel strike selection — here's what it found (and what the 30-delta rule already gets right)

TL;DR: I've been running the wheel for about 20 years and doing ML for about 20 years. Finally built the project I'd been putting off — a model that scores individual CSPs and covered calls for "will this hit 50% profit before expiry?" across 28 tickers, from SPY/QQQ/AAPL through the volatile stuff most of us actually wheel now (COIN, MSTR, HOOD, OKLO, PLTR, MARA, RIOT, SOFI). Trained on 461K trades, 2020-2026. The punchline: on SPY/QQQ the standard wheel rules (30-delta, close at 50%, roll at 21 DTE) are hard to beat. On the volatile names, the rules you inherited from SPY-land actively hurt you, and the model learned why. Sharing because strike selection is where most wheel operators (myself included) under-think, and this experiment made me change how I size and select.

Repo (MIT, nothing for sale): https://github.com/caradhras36/options-ml-scoring

Not a product. Not a service. No Discord, no newsletter, no course. Posting because the wheel community is where I've learned the most over the years and this is me putting something back.

Why this matters for wheel operators specifically

The wheel works. "Sell 30-delta CSP, close at 50%, if assigned sell 30-delta covered call, close at 50%, repeat" is a disciplined playbook that beats most of what retail does.

But the moment you extend the wheel past SPY/QQQ into volatile single names — which is where most of the premium is, and also where most of us have gotten blown up — the heuristics stop behaving the same way. A 30-delta on SPY and a 30-delta on COIN are not the same trade, and every experienced wheel operator knows this intuitively. The question I wanted to answer with this project: can a model formalize that intuition, and does doing so actually help?

Short answer: yes, but less than the headline backtest numbers suggest, and mostly by forcing discipline that an honest wheel operator would already apply.

Setup

For every CSP or covered call meeting:

  • 21-60 DTE
  • |Delta| 0.10-0.40 (normal wheel band)
  • Mid > $0.05

…score whether it will hit 50% of max profit before expiry. Five models on the same ~30 features:

  1. hit_50pct — binary (primary)
  2. max_profit — how much you realistically clear
  3. days_to_50 — how fast it cooks
  4. expected_value — dollar EV
  5. outcome_category — full win / partial win / breakeven / loss

Data from Polygon EOD chains, 28 tickers, Jan 2020 - Mar 2026. Greeks computed via Black-Scholes and cross-checked against OptionsDX to 0.4% delta error.

The question I care about most for wheel operators: given a ticker and a chain, which strike should I actually sell this week?

What the model learned (wheel-specific takeaways)

Three findings that changed how I wheel:

1. Ticker-specific delta bands

If you apply "sell 30-delta CSP" uniformly across 28 tickers, the hit-rate distribution looks like this:

Ticker 50%-profit hit rate at ~30-delta CSP
SPY 89%
QQQ 87%
AAPL 84%
NVDA 80%
PLTR 74%
COIN 71%
OKLO 68%
MSTR 67%

The SPY rule is a SPY rule. On OKLO/MSTR/COIN, 30-delta is meaningfully more likely to go against you than the community heuristic implies. The model learned to shade lower delta (15-25) on the volatile names and hold normal 25-35 on the mega-caps. My manual rule now: on any name with ATM IV > 50%, shift the target delta band down by ~10 points. This by itself — no ML required — probably captures a meaningful chunk of what the model found.

2. rv_iv_ratio is the feature I wish I'd tracked for 20 years

The single most useful engineered feature in the model is rv_20d / iv_atm — 20-day realized volatility divided by ATM implied. When it's low, you're selling rich premium relative to what the underlying is actually doing. When it's high, you're selling cheap premium into a stock that's been moving.

Every wheel operator does a version of this in their head ("IV looks juicy") but I'd never actually normalized it ticker-by-ticker. The model treats rv/iv on SPY and rv/iv on COIN as the same signal, which is exactly what you want — it's a relative richness signal.

Practical wheel rule: if rv_iv_ratio > 1.2 (realized exceeding implied), skip the open. Wait for IV to catch up or for realized to cool. Not a model requirement — a rule you can apply from any options data source.

3. The 50%-profit label is actually the right wheel target

I was nervous the "hit 50% profit before expiry" label would be weird for wheelers who hold through assignment. Turns out it maps well. For CSPs that end up assigned, the 50%-profit label is rarely hit (the position gets assigned at a loss or at close-to-max — that's what assignment is). The model learned to score low-probability-of-50% trades as "avoid" even when premium looked attractive, which is basically the wheel operator's "do I want to own this at this strike" gut check, formalized.

The model is not a replacement for "pick tickers you're willing to own." It's a filter on top of that.

Results — and the part where I beg you to discount the dollar number

Holdout backtest: Jan 2025 - Mar 2026 (15 months the model never saw):

Metric Model (threshold 0.85) "Sell everything in the band" baseline
Trades 193,608 285,379
Hit rate 99.7% 78.2%
Avg P&L / trade $404 $95
Precision lift +11pp

Five reasons the +$400/trade is fiction-adjacent:

  1. Mid-price fills. Every backtest trade fills at the bid-ask midpoint. In a real wheel account selling on names like MSTR or COIN, you're giving up 10-15% of the credit to spread. That alone knocks $30-60 off the average per-trade figure.
  2. No capital / margin / concentration constraints. 193K trades over 15 months is ~500/day. No wheel account has that capital. The realistic question is "among the N trades I can actually put on today, does the model's top-N beat the heuristic's top-N?" — and I haven't answered that yet.
  3. annualized_return is the model's top feature. SHAP analysis shows the single most important input is premium-per-day-per-capital. That's technically known at trade entry so it's not strict leakage, but it means a meaningful chunk of the model's "edge" is just "avoid thin-premium trades" — which is a rule you can write on a napkin. I'm retraining without it to see what survives.
  4. 15 profitable months = a favorable regime. The backtest window was mostly benign for premium sellers. I have no data on what this does in a 2008-style crisis or a 2001-style low-vol grind where premiums compress.
  5. No assignment/wheel path modeled. The label is hit-50% or not. It doesn't follow the CSP-into-assignment-into-covered-call cycle that actually defines the wheel. A version that does is on the roadmap but isn't built yet.

What actually changed in my own wheel

Because the only thing that matters for wheel operators is "did this make your own book better," and here's the honest account:

  • I stopped selling CSPs at the same delta on COIN/MSTR/OKLO that I sell on SPY. This was the biggest behavioral change, and it doesn't actually require the model — the finding is "volatile names need lower delta," and once you know that, you can apply it manually.
  • I added rv_iv_ratio as a manual gut check before opening any position. No model required — just a 20-day realized vol calc.
  • I do not use the model as a go/no-go signal. I use it as a confirmation check. Model agrees with my intuition + delta + rv/iv → I size up. Model disagrees with my intuition → I shrink size or skip. Never the only input.
  • I'm more skeptical of high-premium CSPs on volatile names, not less. The SHAP analysis caught several cases where the model was rewarding high-premium trades that were actually bad (high notional ≠ good trade), and that made me audit my own real-money history. I found two OKLO trades from 2025 that fit the "bad trade that looked good because premium was fat" pattern. Costly lesson, but it's the kind of lesson a model-output review can surface.

The deeper insight — and the one I'd push you to steal regardless of whether you ever touch the model — is that most of the time, the wheel rules are right, and the wheel rules are wrong in a specific, identifiable direction on volatile tickers. The rules were designed on SPY. If you're wheeling anything with ATM IV above 50%, derate your delta band.

If you want to use this on your own watchlist

Repo has a Jupyter notebook (notebooks/example_inference.ipynb) that walks through scoring a single trade end-to-end — I used an NVDA $130 PUT as the example. You feed it a ticker + chain snapshot, it returns all 5 predictions plus a SHAP breakdown showing which features pushed the score up and down for that specific trade.

That's the fastest path for a wheel operator who wants to actually use this — not to retrain, just to score your own watchlist each week. Everything is MIT-licensed. The 28-ticker pre-trained model is in the repo via Git LFS.

https://github.com/caradhras36/options-ml-scoring

What I'd actually want feedback on

  1. Is 50% profit the right label for wheel operators? "Close at 50% OR 21 DTE" is more realistic for most of us. I'll probably relabel in v8. Anyone tried both?
  2. Does rv_iv_ratio > 1.2 = skip match your experience? I think it's generalizable but I've only tested it on these 28 tickers.
  3. For wheel-specific backtesting, should I model the full CSP→assignment→CC cycle as the label, instead of per-trade hit-50%? That's a bigger project but probably the right one.
  4. What's your honest per-ticker delta band on volatile names? I moved to 15-25 on COIN/MSTR/OKLO, 25-35 on SPY/QQQ. Curious what other experienced wheel operators do.

I'll be in the thread for the next few hours. Brutal feedback welcome — especially on the mid-price backtest and the annualized_return leakage concern, which are the two things I'm least confident about.

u/Big-Sandwich6046 — 1 day ago

How I rolled my NBIS 175C May 15 when Delta hit 0.46. Strike room over premium credit for roll.

NBIS roll breakdown — why I chose strike room over premium credit

Wanted to share this one because it's the less-obvious version of a roll decision and I think it's worth walking through.

The original trade

Last week I sold the NBIS May 15 2026 $175 call for $3.70 ($370/contract). US-Iran tension had IV pumped, the premium was juicy, starting Delta was 0.20-0.25 — clean OTM setup, I felt safe.

Then the de-escalation headlines hit, AI names ripped, and NBIS ran hard. My 175 went almost ATM. Delta jumped from 0.20-0.25 to 0.46 today. I passed right through my ideal roll zone (0.37-0.43) without noticing.

Checking my options

My rule is "roll while you still have choices." I missed the ideal window but I was still one step away from the cliff, so I ran the checklist instead of panicking.

Same expiration (May 15): Every higher strike required a net debit. My rule is never pay a debit to roll. Not a chance, same expiration date was out.

Go out in time (June 18): This is where it got interesting. Two real choices:

  • $190 strike, June: Net credit ~$2-3/contract. Strike only $15 higher that the original strike.
  • $200 strike, June: Closing the old 175 cost me ~$13. New 200 sold for ~$13. Roll was net ~zero — maybe a few cents debit at mid, small credit with a good fill. But the strike moved a full $25 higher.

The question I always ask

"If I didn't already have this position, which one would I open fresh today?"

Answer was the 200. Here's why:

At the 190 strike, Delta stays around 0.41 — I'm still a breath away from the cliff. NBIS has momentum, one more rip and I'm right back in the same spot. At the 200 strike, Delta drops to 0.28. That's breathing room for a month.

And on the "never pay a debit" rule — technically the 200 roll was net ~zero, not a real debit. With a patient limit order I could even get a small credit. So I wasn't paying to stay in a bad idea; I was buying $25 of strike room for effectively free.

What I actually did

Closed the May 15 $175 call for ~$13 ($1,300). Opened the June 18 $200 call for ~$13 ($1,300). Roll net ~$0. The original $370 premium stays in my pocket.

What the roll got me

  1. 34 extra days. Theta is back on my side.
  2. Break-even moved way up. Was $175 + $3.70 = $178.70 (NBIS had already blown through that — I was underwater on paper). Now it's $200 + $3.70 = $203.70. NBIS has to rally another ~10% from here by June to actually hurt me.
  3. Delta from 0.46 back to the hig 0.20s. Off the cliff.

The alternatives I rejected

  • Just close it and eat the loss: Would have been -$9.30/contract realized (-$930). Game over, nothing to show for the original $370.
  • Don't roll at all: If NBIS keeps running, Delta goes to 0.70+, buyback becomes $18-22, and finding a credit roll gets impossible. That's how you end up assigned on a name you didn't want to sell.
  • June 190 for the credit: Would've pocketed another ~$270 in premium. Tempting. But break-even sits at $196.40 and NBIS is already showing it can run through those levels. I'd be at the same table in two weeks.

The trade-off I actually made between 190 strike price and 200 strike price

I bought strike room, not premium. Gave up ~$270 of potential credit to buy $25 of strike space. For NBIS to hurt me now, it has to rally another ~10% in five weeks. That's a trade I'll take.

The general takeaway — when momentum is real, credit maximization is the wrong target. Strike room is the right target. The goal of a roll isn't to collect the most premium today; it's to put yourself in a position where you're not rolling again in a week.

Happy to answer questions on the mechanics.

reddit.com
u/Big-Sandwich6046 — 9 days ago