u/stickPotatoe

▲ 2 r/datasets+1 crossposts

Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]

Hi everyone,

I'm starting a research project on financial time-series forecasting using LSTM and Transformer models for predicting S&P 500 market direction.

Right now, I'm struggling with obtaining reliable long-term historical data.

I tried Yahoo Finance, but downloads are inconsistent/failing for me, and most Kaggle datasets I found only contain around 5–10 years of data.

I specifically need:

  • Around 30 years of historical S&P 500 data
  • Preferably daily OHLCV data
  • Reliable and clean source suitable for ML research
  • Ideally free or student-friendly

I also want to understand what researchers typically use in academic work for financial forecasting:

  • Yahoo Finance?
  • Alpha Vantage?
  • WRDS/CRSP?
  • Polygon?
  • Kaggle?
  • Something else?

Additionally:

  • Is using only S&P 500 index data enough for a Master's level research project?
  • Or should I include technical indicators, macroeconomic data, sentiment, or constituent stock data?

Would appreciate guidance from people who've actually worked on financial ML projects.

Thanks.

reddit.com
u/stickPotatoe — 1 day ago