u/Jookster1

▲ 1 r/algobetting+1 crossposts

Pushed a redesign + new evaluation views to my free WTA prediction site

Quick update for anyone tracking this. Just shipped a layout overhaul and a few new model-evaluation views on datadrivenpicks.club.

The model itself: point-level Markov chain (p_in / p1w_serve / p2w_serve per player per match) with surface and recent-form adjustments. v3 launched a couple weeks ago and stacks an XGB winner-blend on top + separate XGB regressors for game margin and total games, trained on ~19k WTA + Challenger matches.

What's new on the site:

  • Calibration scatter — predicted probability vs actual win rate per confidence bucket, with a 45° reference line and color coding by deviation from it
  • Surface breakdown — ML accuracy + MOV / Total error standard deviation across hard, clay, and grass
  • Brier and Log Loss as headline quality metrics next to the existing accuracy KPIs
  • Bracket simulator over the active draws — Monte Carlo over the full bracket, championship probability per player, updates round by round
  • Bidirectional game-margin chart per match — cumulative line on top pane, exact-margin PMF bars on the bottom, single SVG

Current track record (ML market, match-level dedupe to favored side):

  • Accuracy: 60.4% across 632 settled matches
  • Brier: 0.232 (random binary = 0.25)
  • Log Loss: 0.655 (random = 0.693)
  • Surface σ: Hard 4.61 / Clay 4.78 / overall 4.75 — pretty consistent across surfaces
  • Calibration mostly clean, but the 60-70% confidence bucket runs a few percentage points hot (mid-bucket overconfidence — leftover from v2's anti-calibration that v3 only partially solved)

Free, no paywall, no monetization. Performance page has everything public.

Site: https://datadrivenpicks.club Daily edges on Twitter: https://twitter.com/Gay4WTA

Open to feedback — anything that looks miscalibrated or over-engineered, I want to hear it.

reddit.com
u/Jookster1 — 2 days ago
▲ 3 r/algobetting+1 crossposts

I've been working on a forecasting project for women's hockey — currently 565 games across the PWHL (full play-by-play), IIHF Women's World Championship (box scores OCR'd from official PDFs across 7 tournaments since 2017), and Olympic women's tournaments (2018, 2022, 2026 via Wikipedia tables).

Per-league forecasting performance, time-based 80/20 split:

  • International (Worlds + Olympics): 86.5% accuracy on 52 held-out games, log-loss 0.35, Brier score 0.115. Well-calibrated in high-confidence bins (96%-predicted bucket → 100% actual; 76%-predicted → 75% actual)
  • PWHL: 56.9% accuracy on 58 held-out games — parity is real and team-level features (rest, recent form, goalie save%) aren't sufficient to break much past coin flip with current sample size

Approach is league-specific Elo (tuned via grid search — international K=100/divisor=150, PWHL K=50/divisor=400 since the leagues have very different talent stratification) plus a Poisson goal-totals layer for over/under analysis.

What I'm missing: any historical odds data for these competitions. Specifically:

  • Opening and closing moneylines and totals for PWHL games (2024 onward)
  • Same for IIHF Women's Worlds 2017–2025
  • Same for the three Olympic tournaments

Sources I've explored that don't work: the most popular sports-data API has zero women's hockey. Flashscore only retains the last live line, which isn't a true closing snapshot. OddsPortal has some pages but the coverage for these specific leagues is patchy.

If anyone has worked with women's hockey odds data, has an archive from a previous project, or knows of a less-mainstream source that covers these competitions, I'd be very grateful for pointers. Happy to share the dataset and methodology back.

reddit.com
u/Jookster1 — 14 days ago

I've scraped + modeled the entire bettable universe of women's hockey: PWHL, IIHF Women's Worlds, and Olympic women's tournaments. Some quick numbers on what's built so far:

Dataset (565 games):

  • PWHL: 311 games (Jan 2024 → present), full play-by-play — every shot, hit, faceoff, on-ice +/-
  • IIHF Women's Worlds: 202 games across 7 tournaments (2017, 2019, 2021–2025), box scores OCR'd from official IIHF PDFs (the cid-encoded fonts were a fun wall to break through)
  • Olympics: 71 games (2018, 2022, 2026) — pulled from Wikipedia since IIHF Hydra had only schedule placeholders

Model performance, chronological 80/20 split, no leakage:

Bucket Test n Accuracy Log-loss Brier
INTL (Worlds + Olympics) 52 86.5% 0.349 0.115
PWHL 58 56.9% 0.658 0.234

The international gap (USA/CAN ~250 Elo above everyone else) is huge and the model captures it — calibration tight in the high-confidence bins (96% predicted → 100% actual, 76% → 75%, 5% → 0%). PWHL is essentially coin-flip-by-design — added rest, recent form, and goalie save% as features and they have the right sign in the coefficients but can't break the parity ceiling with only 245 training games.

I tuned Elo K + slope per league via grid search (INTL K=100, divisor=150; PWHL K=50, divisor=400 — 538-style approach). Also have a Poisson goal-totals model that's well-calibrated for PWHL (34% predicted ≈ 29% actual on over 5.5) but systematically over-predicts for INTL — which itself is interesting because it implies under bets on tournament games near the model's predicted total are likely mispriced toward bettors.

What I don't have: ANY odds data. I subbed to The Odds API and learned the hard way that they have zero women's hockey coverage (NHL/AHL/Liiga/SHL only — no PWHL, no IIHF Worlds, no Olympics).

The ask:

Looking for opening + closing lines on women's hockey games — moneyline, totals, ideally puck line if available. Specifically:

  • PWHL games (2024–present)
  • IIHF Women's Worlds (2017+)
  • Olympics (2018, 2022, 2026)

Anything helps:

  • Free or paid sources I haven't found?
  • Historical odds archives (a big CSV from anyone's old project would be incredible)
  • Specific sportsbooks that posted lines and might still have archives?
  • API recommendations beyond OddsJam / Pinnacle / The Odds API?

Flashscore has odds but only saves the last live line, not the actual closing line, which limits its value for CLV analysis. OddsPortal has historical but I'm unsure on their coverage for women's hockey specifically.

Happy to share the model + dataset back once I get this last piece. Specifically can produce:

  • Calibrated win-probability and total-goal predictions for any future game
  • Backtest results showing which lines historically had value
  • The tuned Elo ratings (USA 1861, CAN 1728, then a 250-point cliff — Switzerland actually has the third-highest rating right now after their 2026 Olympic bronze)

Thanks in advance to anyone who can point me in the right direction.

reddit.com
u/Jookster1 — 14 days ago
▲ 3 r/algobetting+2 crossposts

I've been working on a women's basketball prediction model — built it for one league by reverse-engineering an existing public model, now trying to apply the same approach to other leagues. WCBA is on my list and I've hit a wall on data.

What I have already:

  • Fixtures + final scores for 5 seasons (~1,300 games) via Sofascore's API.
  • Team-level box scores (FG/3P/FT, rebounds, fouls, lead-run advanced) for the current 25-26 season, ~100% coverage.

What I need:

  • Per-game team box scores for 21-22, 22-23, 23-24, 24-25 — Sofascore's event/{id}/statistics endpoint mostly 404s on these older games (only 12-31% coverage).
  • Per-game player box scores for any of those seasons would be ideal — basic counting + shooting splits, no need for advanced metrics.

What I've tried:

  • Sofascore (no player-level data for WCBA at all)
  • Asia-Basket box scores (numbers are obfuscated with character substitution; team-stats summary pages are clean for current season but historical are paywalled / roster-only)
  • Flashscore feeds (couldn't crack the URL pattern despite knowing the tournament_id)
  • Federation sites (cba.net.cn returns 405; cbaleague.com is empty)

Anyone have a CSV dump, a working scraper, or a tip on a source I'm missing? Even a partial historical export from a previous project would help. Use case is non-commercial.

Happy to share what I've already pulled if useful in trade.

Thanks!

reddit.com
u/Jookster1 — 18 days ago