r/algobetting

I built a predictive model for football match stats (shots, corners, fouls) across 20,000 matches. The strongest predictor ended up being ELO from chess. [OC]
▲ 64 r/algobetting+2 crossposts

I built a predictive model for football match stats (shots, corners, fouls) across 20,000 matches. The strongest predictor ended up being ELO from chess. [OC]

For the past few months I've been working on a personal project: a predictive model for per-match football statistics. Not the final score, but the behaviors: how many shots each team will take, corners, fouls, cards. The dataset covers around 20,000 matches across five seasons and the top 5 European leagues.

I started with hundreds of variables: rolling shot averages, foul rates, corner frequencies, home/away splits, opponent profiles. Everything you'd expect. The first results were decent, but the model was essentially regressing toward each team's historical mean without any real understanding of match context. It could see that Team A averages 14 shots and Team B averages 11, but it had no concept of the gap between the two sides. It didn't know that tonight Team A is so much stronger they'll pin Team B in their own half for 70 minutes and probably end up with 19 shots while Team B scrapes together 6.

Historical averages are built against opponents of all quality levels. They encode nothing about the specific match being played, and that contextual read is exactly what every football fan processes automatically before kick-off. The hard part is giving a model a number for something so intuitive.

I ended up turning to chess. ELO ratings were invented in the 1960s by Arpad Elo to classify players more precisely than tournament standings alone. Beat someone stronger and your score rises significantly; lose to someone weaker and it drops. It updates after every game, with the only inputs being the result and the relative strength of the two players — no performance quality, no expected goals, just who won and against whom.

I built an ELO system for all clubs across the top 5 leagues, initialized from external sources and updated match by match through five seasons. When I added the ELO gap between the two teams as a predictor, things shifted immediately.

Bivariate Spearman correlation with shots:

Predictor Correlation
ELO gap 0.377
Rolling shot average 0.273

The chess number outperformed every football-specific variable in the model. And when you break it down by bucket, it's obvious why:

ELO gap Avg shots
< −200 (much weaker) 9.2
−200 to −100 10.5
−100 to −50 11.0
±50 (balanced) 12.8
+50 to +100 13.0
+100 to +200 14.4
&gt; +200 (much stronger) 17.4

Global average: 12.7 shots

From 9.2 to 17.4 driven entirely by the strength gap — and no rolling average captures it, because rolling averages don't know who those shots were taken against. A team that faced three weak sides in a row will have inflated numbers; the ELO gap adjusts for that automatically.

200 variables, five years of data, six leagues, and the most important feature had nothing to do with football.

Happy to get into the methodology or the initialization choices in the comments.

u/Agalex97 — 1 day ago
▲ 0 r/algobetting+1 crossposts

Would I be considered a sharp?

Attached are my betting results tracked by Pikkit. I started betting on sports seriously with the intention of making serious profit when I turned 18 in May of 2025 and was wondering if I would be considered a Sharp and if not what I can do to be considered one.

I know my sample size is relatively small only a bit less than 350 bets placed but was wondering.

For context, I mainly bet on UFC and Boxing for like 95% of my bets and in 2025 was mainly betting using DFS apps like Underdog, PrizePicks, Sleeper, etc. But now I feel like those apps have become more efficient when it comes to their lines for the UFC and Boxing so now I mainly just bet on moneylines in 2026 via apps like Kalshi.

Finally, I want to become a Quant Trader at a HFT firm and was wondering how marketable this would be for this assuming I scale it up more and if so how I should word it.

u/Initial-Web4015 — 3 days ago

"Knowing a sport" absolutely matters for profitable pre-game betting &amp; CLV.

There is this myth floating out there that "knowing a sport" is irrelevant to be a profitable better. It is true for top down strategies like arbitrage, but for any kind of originating where you are truly beating the market bottom up(like getting CLV on liquid main lines on Pinnacle for NFL/Soccer/NBA) it absolutely matters. The best bettors who get constant CLV in mainlines in major leagues have a way to price something(models, etc), but they also absolutely rely on discretionary skills like understanding injuries, market dynamics, precedents, etc. And frankly a person who just uses a model for something like the NBA pre-game has zero chance to beat a person using both a model + qualitative skills even if they have the best model in the world. And lastly, when I say "knowledge" its more than just knowing the sport or trying to predict a winner, so this includes stuff like market dynamics, patters, how books work, injuries, intuition, sentiment of injuries, etc.

reddit.com
u/Calm_Set5522 — 4 hours ago

What is the best websocket odds provider?

So I am looking to get live odds for major leagues from Pinnacle/FD. My question is are there any realiable third party odds providers that do it via a websocket so that you are at most milliseconds behind the actual odds of Pinnacle/FD. I don't mind paying even a few thousand a month, but there is so much garbage out there its hard to tell which providers are actually legit.

Does anyone know?

reddit.com
u/Calm_Set5522 — 2 days ago

Using Pinacle Live Markets

Hey, I am currently working on an Ev betting website with a plan to expand my future product to include real-time markets, especially focused on soccer, but also want to include basketball and tennis.

I have already deployed an application to monitor and gather odd movements, place bets, and simulate profit with kelly sizing. As a sharp reference book, I use Pinacle

But find many of their lines to be stale and throw False ev signals, which, with more data, I can obvisoly filter myself later on by inspecting the data.

The question is, what do you guys use as a reference sharp betting platform for live markets, maybe Betfair exchange. Is there any other good reference sharp market with good live coverage?

reddit.com
u/IllustriousGrade7691 — 2 days ago

Should I pivot from arbitrage to value bets?

I've been running an automated arbitrage strategy on live sports for the past few weeks and it has been profitable overall (avg ROI 2.8% per bet).

However, I've noticed that a few times the algo has had problems filling both sides of the arb, resulting in bigger losses.

I did some analysis on the past betting data and noticed that I would have made at least 1.5x the profits by only placing bets on the soft book (ROI ~22% from those +EV bets). Though I should mention that the sample size is quite small (31 bets), which could affect the results.

So my question is, should I pivot to doing value bets by using the sharp book as true odds and only placing bets on the soft book? Also, I would appreciate any additional advice since I don't have any previous experience of doing automated value bets.

Thanks in advance.

EDIT: "at least 1.5x the profits" is incorrect, the analysis suggests the strategy would have made closer to 3x the profit.

reddit.com
u/Lelleri1331 — 2 days ago

I built a stacking ensemble for football Over/Under markets across 8,200 bets. ELO gap turned out to be the strongest single predictor. [OC]

Been working on this for about a year. Here's what actually moved the needle.

**The model:**

Stacking ensemble — XGBoost + LightGBM + Random Forest as base

learners, Logistic Regression as meta-learner. Isotonic calibration

on top. Threshold auto-tuned per market on validation set.

**~165 features per match:**

- ELO ratings with K=30 and +100 home advantage modifier

- Form: last 10 home games and away games tracked separately

- xG luck factor (actual goals vs expected goals delta)

- Rest days, H2H records, league position, referee tendency

- League baseline stats per market

**Why ELO ended up on top:**

Same finding as the chess ELO post from earlier today — historical

rolling averages don't capture opponent quality. A team that faced

three weak sides in a row has inflated shot numbers. ELO adjusts

for that automatically.

In our feature importance output, ELO gap ranks #1 across

goals markets. Especially dominant for Over 0.5 — mismatched

games (ELO gap >200) almost never finish 0-0.

**Backtest methodology:**

Time-based 80/20 split — no data leakage. Trained on seasons

up to cutoff, tested on what came after. 12 European leagues,

11 betting markets.

**Results on 8,200 bets:**

| Market | Hit rate | n |

|-------------------|----------|-------|

| Over 0.5 goals | 93.5% | 1,134 |

| Corners over 12.5 | 78.0% | 1,134 |

| Over 1.5 goals | 77.8% | 1,096 |

| BTTS | 66.2% | 337 |

| High-conf overall | 85.9% | 1,588 |

High-confidence = model probability ≥ 0.70 across all three

base learners simultaneously.

**What I learned:**

  1. Market selection beats model complexity. Over 0.5 is 93.5%

    not because the model is smart — it's because only ~6% of

    top European matches finish 0-0. The model just identifies

    those 6%.

  2. Stacking beats any single model by 8-12% consistently.

    The meta-learner learns when to trust XGBoost over LightGBM

    and vice versa depending on the market.

  3. Isotonic calibration is underrated. Raw probabilities from

    tree models are poorly calibrated. After isotonic calibration

    the reliability diagram tightened significantly — matters a

    lot for threshold selection.

  4. Correct score and first goalscorer have too much irreducible

    variance. Dropped them early. Focused on high base-rate markets.

Happy to discuss feature engineering or calibration approach

in the comments. Also tracking picks publicly since May 3

if anyone wants to see live results vs backtest baseline.

reddit.com
u/Old-Friendship-8013 — 1 day ago

sportbook provider

yo everyone

currently building a sportsbook app/platform and im looking for good sportsbook api/providers recommendations
main thing im searching for:
- live odds
- pre match odds
- in play odds
- all markets possible
- historical odds/data
- scores/results for settlement when match finished
- low delay realtime updates
+ ball tracking api
mostly football for now but other sports(all sports) are welcome too

im trying to find something with decent/startup friendly pricing because most providers i contacted are charging crazy prices

already checked stuff like goalserve, theoddsapi etc but wanna hear real experiences from people actually running sportsbooks/apps

which providers are worth it and stable long term?

reddit.com
u/Admirable_Piccolo_92 — 4 days ago

Fast, affordable API with live info about a game?

I'm looking for a new data source that can quickly and accurately tell me about what is happening during a game that I can use for some automated projects.

For example, if Aaron Judge gets a Home Run to give the Yankees a 1-0 lead, I want Aaron Judge's stats, the score of the game, the inning, number of outs, etc be sent to me through the API.

I know Sportradar and Genius are the top 2 options, but they are very expensive and also don't sell to everyone even if you're willing to pay. So I'm looking for options that are more widely available, preferably cheaper, but still relatively fast and accurate. I would obviously expect some tradeoffs if it is being sold cheaper, but just trying to find some options that do what I'm looking for.

reddit.com
u/alexkem — 5 days ago

Real-time Pinnacle odds via WebSocket

I have live Pinnacle odds via WebSocket with no delay across 25+ leagues.

Covers the main leagues like soccer, NFL, NBA, NHL, tennis, esports, lower tier regional leagues and a bunch more. Pre-match and live.

Let me know if anyone is interested

reddit.com
u/talinator1616 — 2 days ago
▲ 48 r/algobetting+1 crossposts

I'm a data analyst by day. About 18 months ago I got tired of losing on props by going with my gut, so I started treating it like a work problem. Built a Postgres database that ingests box scores via the NBA stats API, PrizePicks lines from a scraper I wrote, and rotation data from a combo of the NBA's hustle stats endpoint and pbp stats. Everything is timestamped and versioned so I can re-run any historical window.

The dataset: 412 regular season games from Nov 2024 through April 2025, plus the same window for the 2023-24 season for validation. Every starter and 6th man. Points, rebounds, assists, 3PM, and steals+blocks. That's roughly 4,800 player-game rows per season.

Here's what held up across both seasons.

Edge 1: High-usage guards on back-to-back unders (PTS and AST)

I defined "high-usage" as >26% usage rate per Cleaning the Glass. Then I filtered for guards playing their 2nd game in 2 nights where they played >30 min the night before.

2023-24 season: 87 qualifying player-games. Under hit on points at 58.6%. Under hit on assists at 61.2%. Average line on points was 22.4, average actual was 19.1. That's a -3.3 delta.

2024-25 season: 91 qualifying player-games. Under on points: 56.0%. Under on assists: 59.3%. Average line 22.8, average actual 20.0. Delta: -2.8.

The edge compressed slightly year over year but stayed significant. For context, a 57% hit rate at -110 implies a 4.5% ROI. Over a season with maybe 2-3 of these spots per week, that's ~60 bets. At 1 unit each, you're looking at +2.7 units on average. Not life-changing, but it's free money if you're disciplined.

The mechanism is pretty obvious when you think about it: these guys are running the offense, carrying the ball up, taking the tough shots. On night 2 after 32+ minutes of that, the legs go first. Shot velocity drops. They settle. Assists dry up because they're not driving and kicking as hard. The books shade maybe 0.5 points from the normal line but the real performance hit is 2-3x that.

Specific example: Ja Morant, Dec 14 2024 (2nd night of B2B after 34 min vs IND). Line was 24.5 points. He put up 16 on 6-of-17 shooting with 4 assists (line was 7.5). Under both by a mile. This pattern repeated for Shai, Fox, Maxey, Brunson. The only guys who seemed immune were LeBron (he's a freak) and occasionally Luka (who will literally shoot his way into volume regardless of fatigue, but his efficiency tanks).

Edge 2: Rest-advantage overs for big men (REB only)

This one surprised me. I expected rest advantage to matter more for guards given the running, but the rebounding edge for well-rested bigs was actually cleaner.

Filter: Centers and PFs with >24 min/g, coming off 2+ days rest, facing a team on a B2B. Rebounds line only.

2023-24: 104 qualifying games. Over hit 54.8%. Average line 9.2, average actual 10.1. Delta +0.9.

2024-25: 98 qualifying games. Over hit 56.1%. Average line 9.4, average actual 10.4. Delta +1.0.

Why this works: When the opponent is on a B2B, their guards are slower getting back in transition, their bigs are slower to box out, and there are more live-ball rebounds available in general because shooting percentages drop on B2Bs too. The well-rested big feasts on the chaos. It's not that he's playing better, it's that the environment creates more available rebounds.

I watched this play out in real time with Domantas Sabonis on March 3, 2025. Kings had 2 days rest. Hawks were on a B2B. Sabonis line was 11.5 rebounds. He grabbed 19. Wasn't even close. The Hawks bigs looked like they were moving in sand.

Edge 3: The 0.5 point line move signal

I tracked every prop line from open to close for the 2024-25 season using 15-minute snapshots. When a player prop line moved 0.5 points or more from open to game-time close, the direction of the move correlated with the result at 59.3% across 1,240 qualifying moves.

That number is absurd if you think about what it means. The books are adjusting because sharp money came in, and that sharp money is right almost 60% of the time. If you could just ride the coattails of line moves that size, you'd have a 7% edge at -110 without doing any analysis of your own.

The problem: detecting the move requires checking the line multiple times between open and close. I automated it. If you can't automate it, set a reminder to check PrizePicks and DraftKings at open and then again 90 minutes before tip. If the line moved 0.5+, ride it. If it didn't, pass.

One important caveat: this edge is stronger on totals and spreads than on player props specifically. On player props the sample is smaller and the noise is higher. But the direction holds.

What doesn't work (despite what you've heard):

Home/away splits: I ran a paired t-test on every starter's home vs away performance. Out of 143 qualifying players, 21 had a statistically significant difference (p < 0.05). That's 14.7%. Almost exactly what you'd expect by random chance at a 0.05 threshold. The "home court advantage" for individual player props is largely a myth.

"Trending" overs/unders: A player going over 4 out of 5 games has zero predictive value for game 6. I checked. The over rate for players coming off 4+ overs in their last 5 was 51.2%. That's coin flip territory. Recency bias is the single most expensive cognitive error in prop betting.

I'm happy to share the SQL queries or the schema if anyone wants to replicate this.

reddit.com
u/Fancy-Tadpole-2448 — 8 days ago
▲ 12 r/algobetting+1 crossposts

I’ve been working on a sports Elo variant I call Rolling Reset Elo.

Basic argument: classic Elo is good for some things. Not team sports.

Classic Elo has infinite memory. Every game ever played still contributes to the current rating. That makes sense for chess, where you are tracking one person over a long period of time. It breaks down when you are tracking NBA teams where rosters, coaches, injuries, roles, and usage patterns change constantly.

Most public sports Elo systems solve this with some version of regression to the mean. I think that is mostly BS. You drag every team back toward 1500 on a calendar schedule and call it uncertainty. But uncertainty does not show up once a year on the same day for every team. It shows up after trades, injuries, coaching changes, and teams randomly breaking.

A 'Rolling Reset Elo' fixes it structurally.

For each target date, define a lookback window. Reset every team to the same baseline. Replay only the games inside that window. Store the ratings as the pregame feature for that date. Then move the window forward and do it again.

No seasonal regression hack. No stale franchise history. No hidden computed state.

The bigger payoff is running multiple windows at the same time: elo_30, elo_65, elo_365, etc. The ratios between them become features. If short-term Elo is ripping above long-term Elo, something changed. If it collapses below, something broke.

substack link to detailed post

u/__sharpsresearch__ — 12 days ago
▲ 1 r/algobetting+1 crossposts

Pushed a redesign + new evaluation views to my free WTA prediction site

Quick update for anyone tracking this. Just shipped a layout overhaul and a few new model-evaluation views on datadrivenpicks.club.

The model itself: point-level Markov chain (p_in / p1w_serve / p2w_serve per player per match) with surface and recent-form adjustments. v3 launched a couple weeks ago and stacks an XGB winner-blend on top + separate XGB regressors for game margin and total games, trained on ~19k WTA + Challenger matches.

What's new on the site:

  • Calibration scatter — predicted probability vs actual win rate per confidence bucket, with a 45° reference line and color coding by deviation from it
  • Surface breakdown — ML accuracy + MOV / Total error standard deviation across hard, clay, and grass
  • Brier and Log Loss as headline quality metrics next to the existing accuracy KPIs
  • Bracket simulator over the active draws — Monte Carlo over the full bracket, championship probability per player, updates round by round
  • Bidirectional game-margin chart per match — cumulative line on top pane, exact-margin PMF bars on the bottom, single SVG

Current track record (ML market, match-level dedupe to favored side):

  • Accuracy: 60.4% across 632 settled matches
  • Brier: 0.232 (random binary = 0.25)
  • Log Loss: 0.655 (random = 0.693)
  • Surface σ: Hard 4.61 / Clay 4.78 / overall 4.75 — pretty consistent across surfaces
  • Calibration mostly clean, but the 60-70% confidence bucket runs a few percentage points hot (mid-bucket overconfidence — leftover from v2's anti-calibration that v3 only partially solved)

Free, no paywall, no monetization. Performance page has everything public.

Site: https://datadrivenpicks.club Daily edges on Twitter: https://twitter.com/Gay4WTA

Open to feedback — anything that looks miscalibrated or over-engineered, I want to hear it.

reddit.com
u/Jookster1 — 2 days ago

Early Payout Offer. Is anyone else exploiting the 2UP?

Sou brasileiro e tenho testado um método baseado na exploração do bônus de pagamento antecipado oferecido pelas casas de apostas, a conhecida promoção 2UP.

A lógica é simples: seleciono partidas com alta probabilidade de gols e times imprevisíveis, e então aposto em vitória em casa × empate × derrota simultaneamente. Mesmo que isso signifique absorver alguma comissão, minhas perdas são limitadas a max $30 por jogo, o que me permite apostar entre $500 e $1.000 por partida. No caso de um pagamento duplo, recebo o valor total de ambos os lados.

Estou atualmente no 8º dia de testes, com mais de 30 apostas registradas e nenhuma restrição na conta até o momento. O lucro acumulado é considerável, mas a volatilidade é real; há pequenas perdas frequentes que se acumulam até que uma vitória dupla aconteça e as elimine com folga.

Este método se mantém eficaz a longo prazo? Alguém mais está usando algo semelhante?

Metric Value
Total Bets 39
Wins 4
Losses 35
Win Rate 10.3%
Loss Rate 89.7%
Total Invested (losses only) $601
Total Won $2,315
Total Lost $601
Net Profit +$1,714
ROI +$1,714 / $601 = ~+285%
reddit.com
u/Upstairs_Sandwich_33 — 5 days ago

New to the sports betting bot world but learning quickly. I see posts everyday with 70% WR at 44 trades, and 50% ROI after 1 month, but those accounts typically disappear and never post updates.

Aside from a spike in luck, what's a long-term realistic WR and ROI baseline for a money line sports bot in the main sports (NHL, Tennis, Golf, NBA, MLB, Soccer)?

reddit.com
u/Calm-Landscape9640 — 10 days ago
▲ 1 r/algobetting+1 crossposts

ive been building nba prop models for years now and the biggest mistake i see people make isnt bad data. its treating all data the same. so heres four things you can actually go implement right now that moved my numbers significantly

first. stop using season averages. theyre polluted. what you want is a weighted recency window that decays older games. think of it like this. a game from 6 weeks ago should not count the same as last night. you can use an exponential decay on the last 15 to 25 games and your signal gets way cleaner immediately. its not hard to code. just weight each game by how recent it is and let the older stuff fade

second. opponent defense needs to be position specific not team wide. team defensive rating is an aggregate stat and its useless for props. a team can be bottom 10 against guards but top 5 against bigs and if youre taking a point guard under against them because their team drtg looks good you are using the wrong number. isolate by position. it takes more setup but its probably the single biggest edge i added

third. track minutes trends separately from usage trends. most people merge these but they tell you different things. minutes tell you opportunity. usage tells you what the player does with that opportunity. a guy can see a minutes bump from a rotation change but if his usage rate stays flat he is just standing in the corner longer. he is not getting more shots. these are two different signals and you need both

fourth. add a variance filter. if a player had two crazy outlier games in the last 10 your model probably thinks hes trending up. but if you strip those two outliers his baseline hasnt moved at all. run a quick check on whether the recent trend is being carried by outliers or if its structural. if its outliers only. skip the pick. you dont need it

none of this is rocket science. its just that most people stop building their model once it outputs a number and never add the filters that actually separate signal from noise. these four things alone will clean up your board a lot. the rest of what i run is deeper but this is the stuff anyone can build in a weekend

reddit.com
u/gideon_foxie — 8 days ago

How hard is it to bet opening lines for main markets on major sports

I’m talking moneylines, spreads, totals for sports like MLB, NFL, CFB, CBB, NBA.

How much time after the lines are released will I have to bet them at opening price?

Is there a good place to monitor to see when opening odds have dropped or is that something I gotta create myself

Any help is appreciated, thank you!

reddit.com
u/Alarmed-Error529 — 6 days ago

I built riftcast.gg , a completely transparent ML prediction system for League of Legends Esports - feedback appreciated

Hey everyone. I built https://riftcast.gg/, an ML prediction system for LoL Esports with both training stats visible and historic data tracked (if model predictions were correct or not).

The setup:

- 3,091 pro matches in the dataset across 272 teams and 43 tournaments (so far), covering all major regions (LCK, LPL, LEC, LCS) and minor regions

- Series-level predictions (pre-match) and game-level predictions (post-draft)

- Three models running in parallel:

- FastTree (free tier baseline, simplest features)

- LightGBM with patch/meta-aware features (tracks game duration trends, team performance gaps between recent patches and all-time, format interactions like is_bo5 * elo_diff, etc.)

- PCA Sweep — runs a 7000-config hyperparameter search for ~5 hours weekly, PCA-compresses the noisy draft features

- Plus a Consensus prediction combining all three

**Feature engineering:**

The series model uses ~80 features after filtering. Heavy use of:

- Differential features (Blue stat - Red stat) to avoid teaching the model side bias

- Decayed all-time stats + Diff5 rolling windows for recent form

- A custom Elo system with cross-league calibration (this is what handles international events, which only have ~20 games of historical data)

- Hand-crafted composite features (Diff_Composite_EarlyGame, _Combat, _Vision, etc.) to compress correlated signals.

The draft model adds champion-level features: per-lane Overall/Counter/ Mastery/Meta scores weighted by Samples confidence, synergy by lane-pair (Top-Jgl, Mid-Jgl, Bot-Sup), and per-lane "LaneEdge" composites.

I have an "Uncertain tag" which excludes prediction results for predictions with less than 55% certainty which is also shown in the UI for transparency

Accuracy across the last 2 weekly reports published (75 series, 209 games):

https://preview.redd.it/6c3hogqx6a0h1.png?width=1539&format=png&auto=webp&s=ae7009a43f6e84f41b732dbe2d79a75ceb029da3

I also track each model's performance per league and show it on each upcoming match prediction. For example, Consensus (the aggregate from all models) is yet to make a wrong Series prediction for LCS (17/17 correct) and has a fairly good accuracy for Game (Draft) Predictions as well (28/35 correct)

https://preview.redd.it/wv57dtz08a0h1.png?width=930&format=png&auto=webp&s=e0beebbce44f895459e73fd5c9b2d21ae80fabd3

Where I think it's weak:

- International events (~20 cross-region games in dataset) — Elo helps but cross-region calibration is shaky

- LightGBM volatility week-over-week (76%/70% vs 69%/80%) — patch-aware features may be over-correcting

Any feedback will be much appreciated, thanks!

reddit.com
u/EntertainmentCalm889 — 3 days ago
▲ 2 r/algobetting+1 crossposts

Anyone know where I can get low latency live tennis stats. Just looking for something simple like play by play points. I’ve looked around but all ive found is sportsradar which is really expensive.

reddit.com
u/No-Original-5312 — 7 days ago