A lot of noise in Sports Odds Data providers
As a Founder of my own data API, I want to know:
What sports odds provider are you currently using and why?
How much are you paying and what would make you switch?
Looking to create value in the space
As a Founder of my own data API, I want to know:
How much are you paying and what would make you switch?
Looking to create value in the space
Hey r/parlayapi,
Quick update on a feature that just landed: /v1/odds-drop/{sport_key}, an SSE stream that pushes events only when a tracked price moves by >= a configured threshold. Live in the docs at parlay-api.com/docs (streaming section).
Background, since some of you have asked about this:
We've had the raw odds WebSocket and SSE streams for a while. They push every price change, even tiny ones, and your code maintained the previous-price state to detect actual line moves. That's the right architecture for most use cases, but if you're running an arb / +EV / line-shopping scanner specifically, it means rebuilding that state-tracking layer for every (event, book, side) tuple. Worth a Saturday of work, not exactly fun.
A competitor (pinnodds.com) launched an /odds-drop feature last week with this exact ergonomics. Good feature. I'd rather ship it than tell our paying customers to write the same plumbing themselves.
So:
GET /v1/odds-drop/basketball_nba?apiKey=YOUR_KEY&threshold=10
Params:
threshold: minimum American-odds delta to trigger (default 10, so -110 → -120 fires; -110 → -115 doesn't)direction: both | toward_favorite | toward_dog (filter to one direction of line movement, useful for sharp-money detection)bookmakers, markets, event_id: narrowing filtersheartbeat_s: 1-30 secondsEvent shape:
{
"type": "odds_drop",
"event_id": "2026-05-12_Lakers_Warriors",
"bookmaker": "pinnacle",
"side": "h2h_home",
"kind": "game",
"prev": -110,
"new": -120,
"delta": -10,
"direction": "toward_favorite",
"home_team": "Los Angeles Lakers",
"away_team": "Golden State Warriors",
"commence_time": "2026-05-12T22:30:00Z",
"last_update": 1747000000123,
"timestamp": 1747000000124
}
For player props the event also carries player, market_key, market, and line.
Tier: Business+ ($40/mo), same gate as our other streams.
Behavior to know about:
(event_id, bookmaker, side) is silent. The first time you see a side, we record the current price but don't emit an event. From the next price change onwards, you'll get drops crossing the threshold. So a freshly-opened stream takes 1-3 seconds to "prime" before drops start landing.{market_key}:{player}:over@{line} and {market_key}:{player}:under@{line}. For game lines: h2h_home, h2h_away, spread_home@-7.5, total_over@218.5, etc.Verification: I stress-tested 10 concurrent connections, all 10 streamed cleanly with ~2 drops/sec/client on active NBA + MLB markets. No errors, no leaked sessions, no memory growth.
Open question for you all: what shape do you actually want this in? Some likely directions:
Drop a comment with what your scanner actually needs. Easier to ship the right feature if you tell me what good looks like.
Jacob
A few people DM'd me asking about latency and access friction in the odds-API space, so I'm just going to put the numbers out publicly with a way to verify them.
I run ParlayAPI. This post will lean toward our numbers because they're the ones I can actually substantiate, but the framework below works against any vendor (TheOddsAPI, OddsJam, SportsDataIO, anyone). Run the same probes against their endpoints and you'll have an apples-to-apples answer.
Dimensions that actually matter, and how to measure them:
1. Self-serve API access vs sales-gated access. Either you can sign up and start hitting endpoints in under 60 seconds, or you can't. ParlayAPI: yes, $5/mo Starter tier with API access from minute one. Some competitors gate API behind a "contact us" sales chain at any price; that's not API access, that's enterprise sales pretending to be SaaS. Open the pricing page of whoever you're considering. If it says "contact us" instead of a credit-card-required signup, you have your answer.
2. WebSocket push tier required. WebSocket-native real-time odds tend to be locked behind expensive tiers. ParlayAPI: WebSocket available from $20/mo Pro tier. Verify by attempting the same on competitor pricing pages, most are $200-2000/mo or sales-call-required.
3. Per-bookmaker pulse stamping. A common dishonesty in odds APIs is reporting last_update based on the last price-change row stored, not the last time we actually re-verified the price. We surface both: last_update (price-change time) and verified_at (heartbeat time we polled and confirmed the same price), plus an is_current flag if verified in the last 5s. Hit any of our endpoints with ?include=verification to see it live. Verify by checking whether your current vendor distinguishes these. Most don't.
4. End-to-end latency, book to your client. The floor here is the bookmaker's own publish rate. Pinnacle publishes game lines at roughly 2s native cadence. Nobody can be faster than what Pinnacle has already pushed. The honest question is how much overhead the vendor adds on top.
Benchmark script you can run against any WebSocket-capable odds API by swapping the URL:
import asyncio, json, time, websockets
URL = f"wss://parlay-api.com/ws/odds/basketball_nba?apiKey=YOUR_KEY"
async def main():
async with websockets.connect(URL) as ws:
while True:
msg = json.loads(await ws.recv())
ts = time.time()
print(f"{ts:.2f} type={msg.get('type')} count={msg.get('count','-')}")
asyncio.run(main())
You'll see frame cadence of 1.5-3s on active leagues, which matches the book's native rate. Run the same against any competitor's WebSocket (where they offer one) and compare frame timestamps over 60 seconds. The vendor whose count > 0 updates land closest to the bookmaker's own publish cycle wins.
5. Historical archive depth. Not just "we have history" but "how much, how queryable, how cheap to bulk-export." ParlayAPI: 26.8M prop closing rows + 1.39M game-line rows. Bulk historical at a single flat-rate call (/v1/historical/sports/{sport_key}/closing-odds?dateFrom=&dateTo=). One charge per query, not per date in the range. Verify by asking your current vendor what their backfill row count is and whether bulk pulls are per-date or per-call billed.
6. Failover transparency. When primary infrastructure has a hiccup, you should be able to tell. Our responses carry X-Failover-Origin: primary|hot headers and the body wrapper changes shape if you're on failover, so a parser can detect and handle gracefully. Most competitors silently degrade to stale data and never tell you. Our position is that you should know.
What we don't lead on:
Raw polling cadence on individual books. Everyone polls at the book's native rate, that's the physical floor. If a vendor claims sub-second end-to-end latency on Pinnacle, ask them to define the measurement boundary because Pinnacle itself publishes at ~2s. We won't out-claim our way past physics.
Number of books listed. We track 26+, some competitors list 50+. Worth noting many of the "extra" books on competitor lists are non-US sportsbooks (though I hear there's edge/money to be made on Canadian sportsbooks so...tbd) or aggregator pass-throughs with stale data; check freshness, not catalog size.
Pages with full details:
If your current vendor doesn't publish numbers you can verify, that's its own answer. Happy to spec specific use cases in the comments: arb scanning, +EV modeling, in-play decision engines, prop tracking.
Lmk your provider's numbers so I can beat them
Jacob
Want to walk through what happened on the platform over the past 2-3 days, what caused it, and what we changed to make sure this specific failure mode can't recur.
TL;DR: We added a hot failover tier to make the platform more resilient. The way it was wired up created the exact kind of outage we were trying to prevent. Customers saw the site flap between healthy and broken depending on which edge node served them, which made it nearly impossible to reproduce from inside the org. Fixed at multiple layers, durably, including monitoring that would have caught this within minutes if it ever recurs.
What you may have seen:
{"service":"parlay-failover-hot","status":"ready"} instead of the marketing page or a real API responseIf you hit any of those, you hit this bug.
What was actually happening:
We run a primary origin that handles all customer traffic, plus a hot failover tier that's supposed to step in when the primary is unreachable. The failover serves a thinner response shape so your parser doesn't crash entirely while we recover. That's the intent.
What was actually wired: the failover tier got registered onto the same routing layer as the primary. The routing layer treats multiple registered backends as redundant copies and load-balances traffic across them. So roughly half of your requests hit the real backend with full data. The other half hit the failover stub. The split varied by which network edge served your request, so different people on different ISPs / cities / cellular vs wifi connections were getting different ratios of "broken" to "working."
This is the textbook category of bug that's hardest to catch from inside the org: it works perfectly from where the engineers test, and breaks intermittently from elsewhere. The fact that we caught it at all is mostly thanks to customers running their own telemetry and surfacing the discrepancy.
The compound problem:
Even after we identified and fixed the routing, customers who had received a bad response were stuck on it locally for up to 4 hours because the bad response had been cached at multiple layers (browser HTTP cache, CDN edge cache, intermediate proxy cache). Each layer required a different fix.
What we changed:
Cache-Control: no-store, must-revalidate, so a poisoned response cannot pin a browser cache for hours. Even if the absolute worst-case happens again, customer recovery is measured in seconds, not hours.?format=wrapped) so customers who want their parser to normalize once across both primary and failover responses can pin to a stable format. Backward-compatible default unchanged for everyone else./speed page (parlay-api.com/speed) publishes the actual numbers and the methodology so anyone can verify infrastructure claims independently.Shoutouts:
u/bigantny built telemetry on his side that separated "raw event count > 0" from "normalized event count = 0" and caught the failover-shape bifurcation in his own parser before we had any internal signal. That observability shape is exactly what told us to look at routing layers rather than CDN caches, which saved real hours. He's basically an unofficial mod of this sub at this point. The kind of user who makes the product better for everyone else by paying attention.
u/AdMaleficent5772 flagged the outage from his end while we were still chasing symptoms downstream, and stayed in the back-and-forth on features and bugs all week. Apologies for the chaos and genuine thanks for the persistence.
If anyone else saw weird responses over the past few days, please respond here or DM. The internal monitoring catches it now, but customer reports remain the fastest signal.
What we promise going forward:
X-Failover-Origin header and (optionally) body wrapper shape via ?format=wrapped. Documented in /docs/response-shapes.?include=verification exposes per-event verification timestamps so you can defensively gate your own logic.Apologies for the chaos. Trying to make the platform more reliable temporarily made it less reliable. The architecture is in better shape now than it was before this happened, and the monitoring is genuinely better. If we have to publish another one of these any time in the near future: I'll be disappointed in myself.
Your Tech Wizard / Infinite Super Genius Sports Betting Data guy,
-Jacob of ParlayAPI
OddsJam support reply to my refund request, verbatim:
>
Posting because anyone considering OddsJam should read that reply twice before signing up, and because the chronology that produced it is worth being public about.
What happened, in order:
What I got back:
Three replies across what looked like three separate threads. First was signed "Randall." Second was also signed "Randall". Third, the one quoted at the top of this post, was signed "James." Tone shifted enough between them that I genuinely cannot tell whether OddsJam support is staffed by multiple humans rotating coverage or whether the signatures are dressing on templated replies. Either reading is unflattering.
The reply from James is the one worth dwelling on. Every clause is doing work:
>
That is a person manufacturing a paper trail against the customer instead of addressing the substance of the complaint. Announcing it out loud is its own moment.
>
Federal consumer protection statute is binding regardless of what a merchant's own terms of service say. Citing your own contract as the answer to a federal-law question is not a defense; it is a tell.
>
"Standard industry practice" is exactly the defense companies have always used for dark patterns. That other merchants engineer cancellation friction does not, in my view, make it acceptable here. That sentence is the indictment, not the defense.
>
Other than the five-plus confirmation gates and retention upsells engineered into the path between me and the cancel button. ROSCA exists specifically because the difference between "technically possible to cancel" and "actually simple to cancel" is the entire problem.
Setting the legal question aside for a minute:
Even granting OddsJam every benefit of the doubt on whether the cancellation flow is technically lawful, my opinion is that it's just bad business.
Concrete contrast from the same week as this happened. I had a yearly Midjourney subscription that auto-renewed on April 20. I didn't realize until May 11, three weeks past the charge, and emailed asking for a refund. Midjourney pushed the refund immediately, reminded me my subscription might still be active in case I wanted to keep it, and included a one-click link to manage it. Total time to resolution: under a day. Total friction: zero. They didn't quote their terms of service at me, didn't route me through a retention gauntlet, didn't manufacture a paper trail to defend themselves against a future dispute.
That is what a subscription business with confidence in its product looks like. A company that believes customers will come back next year does not need to weaponize a five-screen retention gauntlet to keep someone who has already decided to leave.
OddsJam's reply to me was the opposite of that, and it tells you what they think the cost of letting a customer leave gracefully is versus the cost of squeezing one more billing cycle out of them.
Where this is going:
I'm pursuing a chargeback through my card issuer once the original charge settles. Chargeback reason codes around "merchant did not honor cancellation request" and "services not as described" do not require a regulatory citation to succeed; they require documentation that the customer tried, the merchant resisted, and a paper trail exists. The reply above is the paper trail.
Why I'm posting:
If you're building anything in this space with Claude Desktop, Cursor, ChatGPT custom GPTs, or any other MCP-compatible AI client, here's what's now possible.
ParlayAPI ships an MCP server (parlayapi-mcp) that exposes 10 native tools to any MCP host:
list_sports — every supported sport + league keyget_odds — live moneyline / spread / total across all booksget_player_props — player props, filterable by player + marketfind_arbitrage — pre-computed cross-book arbitrage opportunitiesfind_positive_ev — pre-computed +EV bets vs no-vig consensuscompare_books — side-by-side line comparison across every bookget_prediction_market_prices — Kalshi + Polymarket pricesget_historical_odds — backtesting against the closing-line archiveget_archive_coverage — public archive stats (no key needed)get_account_usage — authenticated credit usage checkWhat that solves:
You don't have to write any code to give your AI assistant access to live sports odds. Connect once, your assistant calls the tools directly when you ask.
Practical example. With the MCP server connected to Claude Desktop, the prompt:
>
Becomes a single function call to find_positive_ev. Claude parses the response, formats the table, done. No Python, no curl, no schema guessing.
Same idea in Cursor while building a model:
>
The IDE calls get_historical_odds and inlines the data in your editor. You spend zero time on the data layer, all your time on the model.
Connect it:
The manifest is at parlay-api.com/mcp/manifest.json. Install instructions and the per-client MCP config (Claude Desktop, Cursor, etc.) are at parlay-api.com/mcp. Free tier is 100K credits / month, no card required, so the agent can sign itself up and start working in one session.
What other betting-workflow tools would you want exposed as native MCP tools? Adding what people actually use is easier than guessing.
Half the people building anything in this space now are doing it through Claude / Cursor / GPT. Saw three "I built this in a weekend" posts last week and all three started with "I asked Claude how to build a +EV scanner and..."
The data layer is the easy part to get right if you pick an API the model actually understands. Most odds APIs were designed for humans reading docs, which means LLMs guess the schema, generate broken curl, and you spend an hour fixing imports.
What works better when you're getting Claude / Cursor to write a betting tool:
1. Pick an API that ships /llms.txt and /llms-full.txt.
ParlayAPI does. The model reads the long-form reference, knows the endpoints, generates working code on the first try. Compare to APIs where the model has to infer the schema from a marketing page.
2. Look for a /cookbook page with drop-in prompts.
ParlayAPI has /cookbook with copy-paste prompts written specifically for Claude / GPT / Cursor. CLV tracker, +EV scanner, arb detector, prediction-market radar, line-movement watcher. Saves the back-and-forth where you describe the problem in natural language and the model writes 200 lines you have to debug.
3. agents.json + MCP when your tool needs to expose itself to other agents.
ParlayAPI ships both. Claude Desktop or Cursor users can connect over MCP and start querying odds without writing any code. The model just gets a tool called get_sport_odds and uses it like any other tool.
4. Free tier without a credit card.
Claude / Cursor will sign up for free tiers as part of the workflow. Anything that requires a card breaks the flow because the model can't enter payment info. ParlayAPI's free tier is 100K credits / month with no card.
Practical example. The prompt:
>
Working code on the first try, because the prompt could land on /cookbook, read the response shapes from /llms-full.txt, and follow the documented pattern.
The data layer is not where AI-coded betting tools fail. They fail at:
Those are model-side problems. Solve those, the data is a free input.
What other APIs in this space are LLMs picking up cleanly? Curious which other tools have built this part well.
If you landed in this sub and aren't sure what we are: short answer, ParlayAPI gives you every major sportsbook's prices in one call.
That's the whole pitch.
What that solves:
You want to bet the Lakers tonight. To find the best price you'd normally check DraftKings, FanDuel, BetMGM, Caesars, BetRivers, and Pinnacle one at a time. With ParlayAPI you check them all at once and take whichever pays best.
Same idea for player props. PrizePicks has LeBron at 26.5 points. Underdog has 26.5 too. Pinnacle has 27. FanDuel has 27.5. You see all of those side by side in one query and pick whatever your model likes.
Who actually uses it:
How it costs:
Free tier is 100,000 calls a month, which covers most hobby projects. If you outgrow that, paid tiers are $5, $20, $40, $100, or $200 a month depending on how much data you pull and how far back the historical archive needs to go.
What it isn't:
Not a betting account. Doesn't place bets. Doesn't tell you what to bet. It just gives you the prices the books are already showing publicly, in one place, with one key.
What's in the bag besides US sportsbooks:
If you want to try it: hit /signup on parlay-api.com, you'll get a key in 30 seconds. The cookbook page has copy-paste examples to get your first useful query running in two minutes.
What was the first useful thing you built or queried when you started using it? Curious what other people in the sub did first.
Not for promos, not for spreading action. For one boring reason: they shop the line.
Same game, same bet, different prices. Books don't coordinate. Here's a real spread from last night's NBA games:
Lakers -3.5
Same bet. Five different prices. If you put $110 down on the Lakers at BetMGM, you'd win $98.21. The exact same bet at Pinnacle wins $102.80.
Per bet that's pocket change. Over 1,000 bets a season at $110 stakes, that's $2,000 to $3,000 you've left on the table just by not checking the other apps. For free.
Casual bettors don't shop. They open one app, place the bet, move on. They're paying the worst available price every time and wondering why their bankroll grinds down even when they hit at a normal rate.
How to actually do it without losing your mind:
The math is boring and that's why it works. Most bettors won't do it because each shopping session feels like winning $2 instead of winning $100. The compound effect is what matters. Shopping every bet for a season is often the difference between "down a little" and "actually broke even".
The obvious counterargument: "What if I get limited at the book that always has the best price?" Soft books (DK, FanDuel especially) do limit winning bettors. The fix is the same as the original advice: spread your action across multiple books, never bet huge on one. If you're flat-staking $50-100 per bet, you fly under the radar at all of them for years.
Anyone here still using just one book? Genuinely curious what's keeping you from spreading out.
The data stack for a working sports betting model is cheaper and simpler than the affiliate-spam guides make it look. Here's the actual breakdown, organized by what you're building.
TL;DR
ParlayAPI free tier covers about 80% of retail use cases for $0. 100K credits per month, 26+ books, live + historical + props + prediction markets in one key. The remaining 20% is sport-specific edge cases (deep box scores, real-time injury news) and you supplement with free open-source tools: nflverse, hoopR, baseballr, pybaseball. Total cost to ship a working model: $0 to $20 per month.
Live multi-book odds (for a +EV scanner)
You need multiple books, fresh data, and a sharp anchor. Pinnacle is the universal sharp; everything else is the soft-side liquidity that lags it. ParlayAPI gives you Pinnacle plus 25+ retail books in one endpoint:
/v1/sports/basketball_nba/odds?regions=us&markets=h2h&bookmakers=pinnacle,draftkings,fanduel
Latency is 1-4s on Pinnacle, 5-10s on the rest. The free tier covers 100K calls per month, enough for 60-second NBA + MLB + NHL coverage all season.
Historical closing lines (for backtesting)
The data shape is just (game_date, sport, home, away, source, close_price). The more books per game, the better.
/v1/historical/sports/{key}/closing-odds returns 7+ books per closing line for NBA / MLB / NFL / NHL games from 2024 forward. For older NBA / NFL data: hoopR and nflverse (R + Python packages, free, well maintained). For soccer back to 2005: football-data.co.uk has CSVs for 22+ leagues, no key needed.
Player props (the hardest free layer)
Pre-game prop lines exist on the live /v1/sports/{key}/props endpoint across 13+ books (DraftKings, FanDuel, Pinnacle, plus DFS apps PrizePicks / Underdog / Sleeper / Pick6 / Betr / Fliff). Historical prop closing lines (15M+ rows) at /v1/historical/sports/{key}/closing-odds?markets=player_*. Archive starts April 2026 since prop archival is newer than game-line archival.
For player stats to actually feed the model: pybaseball for MLB, hoopR for NBA, nflverse for NFL. All free, all maintained.
Live in-play data
/v1/sports/{key}/live returns events that have already started. /v1/sports/{key}/live/period_markets returns in-game Q1-Q4 / 1H spreads + totals + h2h from Pinnacle / DK / FD / MGM / Caesars. The newer /v1/historical/sports/{key}/period_markets endpoint stores every distinct in-play line state with first_seen_ms / last_seen_ms, so you can replay how a Q3 line moved during last night's game.
Real-time injury / lineup news
The genuinely hard layer. ParlayAPI surfaces lineups and ESPN-derived injury status. For sub-1-minute beat-reporter feeds, RotoWire or Action Network's injury subscription products are the standard. Most retail models don't need this layer if they train on closing lines, since the close already incorporates injury news.
Where to start
Sign up for the ParlayAPI free tier. 100K credits per month is enough to validate any model idea before you pay anything. Once you outgrow free, Starter at $5/mo unlocks 7-day historical depth, Pro at $20 unlocks 30-day, Business at $40 unlocks 90-day, and Scale at $200 unlocks the full 10-year archive.
A working +EV scanner is a weekend project against this stack. The data is not the bottleneck anymore.
FAQ
Where do I get free sports betting odds data?
ParlayAPI free tier (100K credits / month, 26+ books, no credit card required). The Odds API free tier (500 requests / month, polling only). For historical: sportsbookreviewsonline.com, football-data.co.uk, nflverse, hoopR, pybaseball.
What data do I need to build a +EV sports betting scanner?
Multi-book live odds (Pinnacle plus retail books) and a no-vig fair value calculation. That's it. Compare offered prices to Pinnacle's no-vig, flag anything that pays better. Doable in under 100 lines of Python against the ParlayAPI free tier.
Can I use Excel data for a sports betting model?
For backtesting, yes. Yearly Excel files exist on sportsbookreviewsonline for MLB through 2021. Modern models almost always use a JSON API for the live layer, even if historical comes from CSVs.
What's the difference between game lines and player props for modeling?
Game lines are the moneyline, spread, and total for the team-vs-team result. Player props are individual-player markets like "LeBron over 26.5 points". Different volume profiles, different books, often different APIs. Most retail bettors lean game-lines for cleaner +EV; prop edges are real but harder to size.
How accurate is the data from a sports betting API?
A real aggregator returns the book prices at the moment of poll. ParlayAPI lets you verify any book is flowing right now via /v1/bookmakers/{key}/freshness (free, no auth, returns age in seconds since the last write per backing table). If the latency you're seeing is more than 30s on any API, that's not modeling-grade data.
Drop your stack in the comments
Always curious how other people in this sub set up their data layer.
Most "how to build a sports betting model" guides skip the boring part: where the data comes from. Then six months later you find out your CSV pull from ESPN drops every postponed game and your "model" is overfitting on selection bias.
This is the actual stack. Every source I have used, what each one is good for, and the gotchas I wish someone had told me before I paid for the wrong tool. Bookmark and share with the next person asking "where do you get NBA data".
TL;DR
A serious sports betting model needs four data layers: live odds across multiple books, historical closing lines, player + team stats, and injury / lineup news. The free options cover three of those well enough to ship a model. The fourth (real-time multi-book odds) is where every paid API fights for your money. Pick the cheapest one that has the books and the latency you need, integrate, and stop overthinking it.
Cost to build a real-money +EV scanner from zero: $0 to $20 per month for the data, plus your time. Anyone telling you it costs more is selling you something.
Why most public guides are useless
The guides that show up on Google when you search "sports betting data" fall into three buckets:
The real answer depends on what you are building. Closing-line backtester? Historical archives only. Real-time +EV scanner? You need live multi-book odds. Player prop model? You need box scores plus prop-specific archives almost no one publishes. Each layer has different sources.
The four data layers every model needs
1. Live odds (multi-book)
The single most expensive and most differentiated layer. You need at least one sharp book (Pinnacle or Circa) plus 4-6 retail books (DraftKings, FanDuel, BetMGM, Caesars, BetRivers, Fanatics). Sharp book gives you the no-vig fair value. Retail books are where the actual +EV bets live (when their slow updates lag the sharp).
Latency matters. A live odds feed that is 30 seconds behind the book is fine for slow markets, useless for in-play.
2. Historical closing lines
Closing line is the wisdom-of-crowds price at game time. Backtesting a model against historical closing lines is the gold standard for measuring whether your edge is real. Two reasons:
Free archives exist for some sports going back decades. Paid archives extend deeper or include more books per game. Pick based on how far back you actually need.
3. Player and team stats
Box scores, advanced stats (eFG%, OPS, EPA, expected goals, etc.), play-by-play. Free for every major sport via official league sites and open-source projects (nflverse, hoopR, cfbfastR, baseballr). Quality is solid; the main task is normalization across years and rule changes.
4. Injury / lineup news
The hardest layer to source cleanly. Real-time injury news moves lines before the books update. Most public APIs surface injury data 1-15 minutes behind Twitter. Paid services exist that monitor team accounts and beat reporters in real time; they are expensive and most are run by one person.
Most retail bettors do not need this layer. If your model is using closing-line training data and projecting to opening-line bets, the closing line already has injury news baked in.
Free data sources (and their actual limits)
The Odds API has a free tier at 500 requests per month. Enough to play with the data shape, not enough to run any real polling. Their free tier was the bar everyone tried to undercut for years.
Sportsbookreviewsonline is the OG historical archive. Free yearly Excel files for MLB through 2021, HTML tables for NBA / NFL / NHL. Patchy after 2022. Most public datasets you find on Kaggle are derivatives of SBR.
football-data.co.uk has soccer closing lines for 22+ leagues going back to 2005. CSVs published Mondays. Free, no key, idempotent imports work great.
nflverse (R + Python packages) has every NFL play-by-play back to 1999, plus pre-game odds for most years. Active maintenance. Free.
hoopR does the same for NBA from 2002 forward. cfbfastR for NCAA football. baseballr for MLB. All free, all maintained, all queryable in Python via pybaseball and equivalents.
ESPN has a public scoreboard API for every major sport. Useful for box scores and final results, not useful for odds. (Their pickcenter only goes back ~2 years and is patchy.)
Kaggle datasets are great for learning. Generally too stale for production models. The dataset's last-updated date matters more than its size.
Paid data APIs ranked
Quick reality check: every paid API has a free tier. Sign up for all of them, hit each /odds endpoint with your sport, measure latency yourself, then decide. Anyone who pays before testing is wasting money.
What paid APIs actually compete on:
ParlayAPI (yes, this sub) covers all four layers with one key. 26+ active books across game lines / props / DFS / prediction markets, with French-licensed books (Betclic, PMU, Unibet, Winamax) for European market work that most US-focused APIs miss. The free tier is 100,000 credits per month, enough to poll NBA every 30 seconds for the entire season. Historical archive: 1.39M+ rows back to 1999 for NFL, 2017+ for NBA, plus 15M+ player prop closing lines from April 2026 forward. Tier table goes free / $5 / $20 / $40 / $100 / $200, with the free tier covering most hobby projects.
Other paid APIs in 2026: The Odds API (the incumbent, ~$30-60/mo for usable polling), OpticOdds (sharp book focus, more expensive), and a handful of newer ones. Test the latency and coverage on free tiers before paying.
Common mistakes when sourcing data for a model
How I would build a data stack from scratch in 2026
If I were starting a real-money +EV scanner today, with zero infrastructure:
/v1/historical/sports/{key}/closing-odds is on the free tier (with 48-hour depth) and on Starter at $5 (7-day depth). For longer backtests, Business at $40 gets you 90 days, Enterprise at $100 gets a year. Most hobbyists never need more than a season.nflverse / hoopR / pybaseball. Free, well-maintained, every major sport. Cache locally; these change rarely.FAQ
What is the cheapest sports betting data API?
ParlayAPI's free tier (100,000 credits / month) covers most retail use cases at $0. Beyond that, ParlayAPI Starter at $5/mo or The Odds API's lowest paid tier at ~$30/mo are the cheapest options with usable latency. Avoid anything that does not let you test the free tier first.
Is there a free sports betting odds API?
Yes. ParlayAPI free tier (100K credits/mo, 26+ books). The Odds API free tier (500 requests/mo, polling-only). For historical only, sportsbookreviewsonline.com (Excel / HTML files), football-data.co.uk (soccer CSVs), and nflverse / hoopR packages on R and Python.
How do I get historical NBA betting odds?
For 2017 forward, hoopR (R / Python). For 2024 forward with multiple US books per game, ParlayAPI's /v1/historical/sports/basketball_nba/closing-odds endpoint returns 7+ books per closing line. SBR has older NBA seasons in HTML tables but coverage drops after 2022.
Where do I get player prop data for sports betting models?
ParlayAPI's /v1/sports/{sport}/props endpoint returns props from 13+ books and DFS apps including PrizePicks, Underdog, Sleeper, Pick6, Betr, Fliff, Pinnacle, DraftKings, FanDuel. Closing lines for player props specifically: /v1/historical/sports/{sport}/closing-odds?markets=player_*. Coverage starts April 2026 forward (when prop closing-line archival began).
What is CLV in sports betting?
Closing Line Value. The implied-probability difference between the price you got and the closing line of the same market. Positive CLV is the strongest single predictor of long-term betting profit, more reliable than win rate over small samples. Track in implied probability points, not in cents, so it is comparable across odds formats.
What latency do I need for live sports betting?
Depends on the model. Pre-game scanners are fine with 60+ second cadence. In-play models need 5-15 seconds. Steam-chasers and arb scanners need sub-5 seconds. Anything sub-1 second requires direct book feeds, not aggregator APIs.
Can I use a sports betting API for free?
Yes. ParlayAPI's free tier (100,000 credits / month, no credit card required) covers most retail model use cases. Polling NBA + MLB + NHL on a 60-second cadence stays well within budget. Historical archive included with 48-hour query depth on free.
Drop your stack in the comments
Curious what the actual readers here are running. Free tier only? Mix of paid + free? Got a clever combo I should be using? The good ideas in this thread will end up in v2 of this guide.
Big infrastructure week. Catching up the subreddit on what's new.
**Customer-facing:**
-
**`/v1/sandbox/*` endpoints**
— synthetic data, no auth, IP rate-limited. Test our response shape and timing without paying or even signing up. Useful during off-hours when no live games are running. [docs](
https://parlay-api.com/docs#sandbox
)
-
**`/v1/sports/{sport}/live/source-health`**
— per-source freshness diagnostic. Polls every 30s in your bot to detect when a feed goes stale, so you don't trade on dead data.
-
**WNBA play-by-play**
— ESPN-sourced, 5-10s end-to-end, same `/v1/sports/basketball_wnba/live/sse` shape as NBA.
-
**SSE PBP now includes player names + scores**
— earlier the trigger only sent event_type. Fixed; team_or_player_a/b, score_a/b, full description all flow through SSE now.
-
**Concurrent SSE/WS connection caps per tier**
— 1 (free), 3 (starter), 25 (pro), 100 (business), 1000 (enterprise). Stops abuse, keeps the pipe healthy for everyone.
-
**Sub-second WebSocket frame capture**
for sportsbook in-play state — DK / FD / Pinnacle / bet365 sources now all run a parallel WS-frame layer that catches push events the REST refetch misses. Verifying parsers against live games this weekend.
-
**Pinnacle period_odds polling tightened**
from 4s to 2s — captures more intermediate values during fast scoring runs.
**New documentation:**
- [Streaming docs](
/docs/streaming
) — unified SSE + WS reference with per-tier caps
- [Webhooks docs](
/docs/webhooks
) — full reference with HMAC signature verification examples (Python + JavaScript)
- [Migration from The Odds API](
/docs/migrate-from-the-odds-api
) — drop-in compatibility, savings calculator
- [API versioning policy](
/docs/api-versioning
) — formal deprecation contract, /v1 stability guarantee
- [vs/the-odds-api](
/vs/the-odds-api
) — side-by-side with annual savings calculator (15-20x cheaper at most volumes)
- [vs/oddsjam](
/vs/oddsjam
) — honest take, when to use which
- [vs/sportsdataio](
/vs/sportsdataio
) — honest take, different buyers
- [/built-with](
/built-with
) — projects customers are shipping with the API. Want yours featured? DM me.
**SDKs:**
-
**JavaScript SDK published**
— `npm install parlay-api`. Drop-in compatible with the-odds-api JS clients, with extensions for prediction markets, DFS, PBP, period markets, plus async iterators for SSE / WS streams. Built-in math helpers (devig, Kelly sizing).
- Python SDK already on PyPI: `pip install parlay-api`
**Internal infra (less interesting but might affect uptime):**
- 3-tier failover Worker probe tightened from 30s to 5s
- Cloudflare edge cache for static pages — marketing site stays up even if M4 origin blips
- Cloudflared tunnel restart Slack alerts (so I notice if it cycles)
- Daily backup verified working (645-702 MB nightly, 3-day rotation)
- Discovery scripts moved to TCC-safe path (was hitting macOS Operation-not-permitted)
- Fraud detection on signup — disposable-email blocklist + 3-signups-per-IP-per-24h cap
**Coming soon:**
- Annual prepay 15% discount (Stripe coupon setup this week)
- Pay-as-you-go tier for occasional / WS-curious users (per-call pricing)
- Slack bot interface for me (so I can interact with the API + CRM from my phone)
- Verified sub-2s state-change PBP across all major US sports (currently flowing on tennis, finalizing DK/FD/Pinnacle SPA capture)
**As always:**
drop questions, requests, or bugs below. I read everything. Most user-requested features ship within a week or two if they're scoped reasonably.
---
Quick test:
ParlayAPI passes all three on Starter ($5/mo):
/v1/oddsIf you're paying $30+/mo for an aggregator that fails any of these tests, you're paying a premium for the wrong tool.
A little behind the scenes story you don't always get: I run my own EV scanner. Things I needed and couldn't find on aggregator APIs:
Built ParlayAPI initially as my own internal aggregator with each integration done from scratch. Worked well enough that other people asked for access. So now it's a public API.
I still use it as my own EV scanner backend. $5/mo Starter covers 95% of my usage. The product I sell is the product I bet with.
That's not normal. Most sports data vendors have never placed a bet and built their product based on what enterprise procurement teams ask for. ParlayAPI is built around what I wanted as a bettor.
Side-by-side from each vendor's public pricing page:
| Tier | The Odds API | ParlayAPI |
|---|---|---|
| Free | 500 / mo | 1,000 / mo |
| Entry paid | $30 / mo for 20K (= $1.50/1k) | $5 / mo for 50K (= $0.10/1k) |
| Pro | $59 / mo for 100K (= $0.59/1k) | $30 / mo for 1M (= $0.03/1k) |
15x cheaper at entry. 20x cheaper at Pro. Same major US books at every tier. PLUS Pinnacle (TOA gates this on higher tiers), Bovada, Kambi-network books (Unibet, PMU, BetRivers, Hard Rock), Polymarket, Kalshi, every DFS-style book, plus play-by-play across NFL/NBA/MLB/NHL/soccer/MMA/tennis.
Migration: change https://api.the-odds-api.com/v4/... to https://api.parlay-api.com/v1/.... Response shape is identical, drop-in compatible. Done.
If you're on TOA and your model doesn't need a market we haven't indexed, you're paying middleware tax for no reason. (just DM me and I'll add the market you need)
Every vendor advertises real-time. They mean different things:
For live-betting bots, 1s vs 10s is the difference between profit and loss.
Verify your source's actual cadence: hit an endpoint on a stopwatch during a live game, watch how often the response actually changes. Don't trust marketing copy.
Common mistake: no historical-odds for some date range, so you generate "synthetic odds" from win percentages or Elo. Reasonable-sounding. It will lie to you.
Synthetic odds are smooth. Real odds are not. Real lines move on injury news, weather, sharp money, public hype. They have variance, errors, arbitrage windows. Backtesting on synthetic odds tells you how your strategy performs in a fictional universe where lines are always perfectly informed.
Specific failure modes:
What to use instead:
If your synthetic-odds backtest claims 1000% ROI over 5 years, it's wrong. Always.
Pulled this from each vendor's public pricing page (May 2026). Verify before relying on it.
The Odds API:
ParlayAPI:
At the entry paid tier, that's 15x cheaper per call. At Pro, 20x cheaper.
What you get for the lower price:
The price gap isn't because we're new and cheap-by-necessity. It's because we built direct integrations instead of layering on top of someone else's aggregator. Different cost structure for us, same data for you.
If you're paying $30+/month for a sports odds API and not getting prediction markets, DFS books, or play-by-play, you're paying middleware tax. There's a cheaper option that does strictly more.
Most people prompt AI like they're searching Google. "Write me a script to do X." Then they're surprised the output is mediocre.
Models are roleplay engines. They write better when they think they're a competent expert.
Compare:
Bad: "Write me a Python script to detect arbitrage between sportsbooks."
Better: "You are an infinite super-genius quantitative trader who loves sports betting and has built arbitrage scanners for a decade. Assume the reader has equal or greater capability. Write a Python arbitrage detector."
The first one gives you a Stack Overflow answer. The second one handles edge cases you didn't think to ask about (max bet limits, line cancellations, two-way devigging) because the model thinks it's writing for a peer.
We use this exact pattern internally. Plug in the domain ("infinite super-genius bettor who loves algorithmic betting") and output quality jumps.
It's free, it works, most people don't do it.
Positive EV betting is the math version of "this bet pays more than its true odds suggest." If a coin flip is 50/50 and someone offers +110 on heads, that's +EV: you lose half the time but win 110/100 the rest, averaging a 5% edge.
The hard part is figuring out the true odds. Sportsbooks don't post them. You either model the game yourself or use a sharp book's price as a proxy. Pinnacle is the standard proxy because they take sharp action and adjust aggressively.
Workflow:
Math:
ParlayAPI gives you both books in one call so writing this scanner is mostly format-shuffling. Free tier (1,000 credits/mo) handles a small EV scanner across NBA/MLB/NFL/NHL during their seasons.