Why my backtests kept lying to me (and what I did about it)
I've spent the last year building a live algorithmic trading system from scratch on Alpaca — momentum rotation on ETFs, RSI mean-reversion swing trades, proper risk management (1% per trade, ATR-based stops, daily circuit breaker, drawdown kill switch).
The thing that humbled me most wasn't the coding. It was running what looked like a genuinely strong backtest, going live, and watching it fall apart within weeks.
After digging into why, I realised almost everything I'd read about backtesting was quietly skipping the hard parts:
- In-sample optimisation is basically cheating. If you tune your RSI period and stop-loss on the same data you're testing on, you're not finding a strategy — you're finding the parameters that fit that specific historical period. It will not repeat.
- Most retail backtesting tools don't model slippage honestly. Assuming you fill at the close price on a thinly traded ETF is fantasy.
- Survivorship bias is invisible until you look for it. If your universe is "current S&P 500 constituents" you're testing on a list of companies that already survived.
What actually helped was walk-forward testing — train on one window, test on the next, roll forward, repeat. It produces worse-looking results but the live performance gap shrinks dramatically.
Curious how others here handle this. Are you using QuantConnect, TradingView Pine, something custom? And do your backtests actually predict your live performance or is there always a big gap?