u/noletovictor

I backtested QLD/TQQQ rotation rules from 1986-2026: top result 39.0% CAGR, but not a free lunch
▲ 54 r/LETFs+1 crossposts

I backtested QLD/TQQQ rotation rules from 1986-2026: top result 39.0% CAGR, but not a free lunch

You can access this dashboard in the link below

This is a follow-up to my previous post here: 40-year LETF rotation backtest — 5 strategy families, 426 configs, here's the full result

In that post, I shared a 40-year LETF rotation study across 5 strategy families and 426 configurations. The main result was what I now call Quad Risk K2 in the dashboard: a QLD/ZROZ rotation that turns risk-on when at least 2 of 4 QQQ/QLD regime signals are true. The four signals were long trend, medium trend, realized volatility, and short-term return persistence. In simple terms: hold QLD when enough risk conditions are favorable; otherwise hold ZROZ.

That original strategy was the one I felt most comfortable calling the robust anchor: it cleared the full validation stack in that study, including DSR, PBO, walk-forward, OOS, forward-stress, bootstrap, and cross-library checks. It also had strong rolling-window behavior versus SPY.

I was honestly happy with the quality of the discussion in the comments on that post. I tried to answer every comment I could, but because of my limited time and because I kept focusing on the research work behind this follow-up, I may not have replied to everyone. I did read the feedback, and this new post is partly a response to the natural next question: can the original idea be improved, extended, or stress-tested further?

So the goal here was not to throw away the original result. The goal was to keep Quad Risk K2 as the anchor and continue searching for improvements and evolutions around it: rearm logic, TQQQ turbo windows, broader technical-vote systems, modern QLD/TQQQ variants, and simpler LRS baselines.

I built a small interactive web dashboard for comparing a set of LETF rotation strategies I have been researching. The app is focused on Nasdaq/S&P leveraged ETF rotation ideas: QLD/TQQQ, SSO/UPRO, defensive legs, trend/momentum/volatility votes, drawdowns, and rolling window behavior.

Link: https://letf-rotation-research.noletovictor.com/

>Important caveat up front: this is research, not financial advice and not a deploy recommendation. Several of the high-CAGR variants look economically interesting, but the formal validation stack still blocked promotion because DSR/PBO failed after accounting for the number of trials. I am sharing the tool because the comparisons are useful and because LETF strategy discussions are more productive when people can inspect drawdowns, windows, and benchmarks instead of only seeing a final CAGR number.

What The Strategies Are

The dashboard currently compares these strategies:

Full-sample metrics shown in the app use the long-history window 1986-01-03 to 2026-04-17:

Name CAGR Max DD Sharpe Sortino
Octa Price K6 QLD 32.05% -57.81% 0.983 1.375
Octa Price K6 TQQQ 40.26% -64.24% 0.951 1.268
Quad Risk K2 31.06% -64.50% 0.919 1.258
Rearm T20D90 38.99% -55.48% 0.975 1.228
Rearm T20D120 39.01% -55.48% 0.961 1.207
Rearm T35D60 36.66% -55.48% 0.962 1.207
Quint TrendMomVol Overlay 38.46% -64.54% 0.872 1.084
Quint TrendMomVol K3 QLD 19.38% -70.07% 0.668 0.907
QQQ B&H 14.58% -82.97% 0.658 0.866
SPY B&H 11.49% -55.14% 0.682 0.842
Quint TrendMomVol K3 TQQQ 21.48% -87.69% 0.637 0.833
LRS SSO 13.88% -51.67% 0.664 0.759
LRS QLD 18.33% -82.54% 0.648 0.741
LRS TQQQ 19.94% -94.36% 0.609 0.696
LRS UPRO 16.40% -71.20% 0.605 0.691
  • Rearm T20D120: the highest-CAGR long-history sensitivity in the final local grid. It keeps the same core Quad Risk K2 shell as Rearm T35D60, but changes the post-crash rearm geometry: after at least 20 OFF days, an OFF-to-ON transition opens a 120-trading-day TQQQ/LRS1.20 rearm window. In the 1986-2026 long-history test it reached about 39.01% CAGR, 1.207 Sortino, and -55.48% max drawdown.
  • Rearm T20D90: the more balanced T/D sensitivity. Same idea as T20D120, but with a 90-trading-day rearm window. It had nearly the same CAGR, about 38.99%, with the best Sortino in the local T/D grid, about 1.228.
  • Rearm T35D60: the main anchor strategy from the previous LETF rotation loop. It uses the Quad Risk K2 shell, QLD as the normal risk-on leg, ZROZ as the defensive leg, a rate/vol cash override, and a T35D60 post-crash TQQQ rearm window with LRS1.20. Long-history result: about 36.66% CAGR, 1.207 Sortino, and -55.48% max drawdown.
  • Quad Risk K2: the simpler four-gate shell. It turns risk-on when 2 of 4 filters pass: QLD above SMA250, QLD above SMA100, 21-day realized volatility below 40%, and AR(1) 30-day persistence above zero. It holds QLD when ON and ZROZ when OFF. Long-history result: about 31.06% CAGR, 1.258 Sortino, and -64.50% max drawdown.
  • Octa Price K6 QLD: an 8-signal price-only vote using SMA/EMA trend filters, ROC momentum filters, and RSI14. It turns ON when 6 of 8 signals pass and holds QLD when ON.
  • Octa Price K6 TQQQ: the same 8-signal price-only vote as Octa Price K6 QLD, but with TQQQ as the risk-on leg. It had the highest long-history CAGR among the listed long-history comparison rows, about 40.26%, but failed DSR/PBO validation.
  • Quint TrendMomVol K3 QLD: a 5-signal vote using SMA100>SMA250, ROC10>0, ROC120>0, StochRSI14>50, and realized-volatility percentile below 70. It turns ON when 3 of 5 signals pass and holds QLD when ON. On modern Tiingo 2010+ data, it reached about 36.26% CAGR with -37.54% max drawdown.
  • Quint TrendMomVol K3 TQQQ: the same 5-signal vote as Quint TrendMomVol K3 QLD, but with TQQQ as the risk-on leg. On modern Tiingo 2010+ data, it reached about 53.00% CAGR with -51.03% max drawdown. On the stricter 1986+ long-history reproduction, it weakened materially.
  • Quint TrendMomVol Overlay: a hybrid that keeps the Rearm T35D60 shell but allows the Quint TrendMomVol K3 vote to act as an additional QLD-to-TQQQ turbo trigger. It improved terminal equity/CAGR versus Rearm T35D60 in some comparisons, but worsened drawdown and Sortino, so it did not dominate the anchor.
  • LRS SSO, LRS UPRO, LRS QLD, and LRS TQQQ: simple Gayed-style trend baselines. If SPY/QQQ is above its 200-day SMA, hold the leveraged ETF next bar; otherwise hold cash. These are included as simple sanity-check baselines.
  • SPY B&H and QQQ B&H: passive comparators so the rotations can be judged against broad equity and Nasdaq exposure, not just against each other.

What The Webapp Shows

The goal of the app is to make the comparison inspectable instead of static.

  • Date range control: choose the full sample or focus on recent 5y/10y/15y/20y periods. This matters a lot because some strategies look amazing in the modern sample and much weaker once older regimes are included.
  • Window Summary: quick cards showing start date, end date, number of bars, best CAGR in the selected window, best Sortino, and lowest max drawdown.
  • Equity Curves: log-scale equity curves for all strategies. You can toggle individual strategies from the side table and inspect values at the cursor date.
  • Drawdown Chart: synchronized drawdown plot for the same selected strategies. This is usually the fastest way to see whether a high CAGR is just hiding unacceptable path risk.
  • Interactive Series Table: shows each strategy's equity at the cursor date, CAGR, and max drawdown. You can sort by cursor equity or Sortino and click rows to show/hide lines.
  • Metrics Table: sortable CAGR, Sortino, Sharpe, max drawdown, Calmar, and ending multiple for the selected date window.
  • Rolling A/B Comparison: choose any strategy as A and any other as B. The app builds 3y, 5y, 10y, 15y, and 20y rolling comparisons, including win-rate heatmaps and final-ratio heatmaps. This is useful for questions like "how often did Rearm T35D60 beat Quad Risk K2 over rolling 10-year windows?" instead of only asking which strategy won over the full backtest.
  • A/B KPI Cards: final A equity, final B equity, A/B ratio, percent of days A was above B, and max drawdowns for both.
  • Rolling Window Hover Details: hover any heatmap cell to see the exact start/end dates, A growth, B growth, CAGR for both, and the final ratio.
  • Strategy Concepts Tab: plain-English descriptions of every strategy in the dashboard: the concept, the algorithm, and the current research status.

You can compare one strategy with another, and see how the differences goes through time

Why I Built It

I wanted a cleaner way to discuss LETF rotation than posting one table of top backtest results. A strategy with 40% CAGR can still be a bad idea if it only works in one regime, has catastrophic drawdowns, or loses to a simpler anchor in rolling windows. The dashboard makes those trade-offs visible.

The short version of the research so far:

  • The robust long-history anchors are still Quad Risk K2 and Rearm T35D60.
  • The best local sensitivity was T20D120 at about 39.01% CAGR, but it is not a validated winner.
  • Quint TrendMomVol K3 TQQQ is very strong on Tiingo 2010+ data, about 53.00% CAGR, but weakens in 1986+ reproduction.
  • The high-CAGR variants are interesting challengers, but DSR/PBO failures mean I would not present them as deployable systems.

I would be interested in feedback from people here, especially on better validation ideas, realistic execution assumptions for QLD/TQQQ rotations, tax/cost modeling, and whether the rolling A/B view changes how you evaluate these LETF strategies.

Discussion Question: Would You Still Follow A Strategy If DSR/PBO Failed?

This is the part I am most interested in discussing.

The strategies in this dashboard are not just top rows from a random backtest table. The better candidates generally passed several practical robustness checks:

  • OOS holdout: they remained profitable on a reserved out-of-sample block.
  • FWD stress window: they survived the most recent forward/stress slice.
  • Walk-forward validation: most stayed positive across rolling train/test windows.
  • Bootstrap checks: resampled return paths usually did not destroy the result.
  • Rolling 3y/5y/10y/15y windows: the best anchors had broad positive rolling behavior, not just one lucky terminal point.

But they still failed the two gates that worry me the most: DSR and PBO.

Plain-English version:

  • DSR, or Deflated Sharpe Ratio, asks: "After accounting for all the strategies/parameters I tried, is this Sharpe still statistically impressive?" A Sharpe that looks good in isolation can become much less convincing after thousands or millions of trials, because some great-looking result is expected to appear by chance.
  • PBO, or Probability of Backtest Overfitting, asks: "When I split the data many different ways, do the configurations that look best in-sample also keep ranking well out-of-sample?" A high PBO means the selection process may be learning quirks of the backtest window rather than a durable rule.

So the uncomfortable result here is:

  • economically, several strategies look very strong;
  • mechanically, they pass OOS/WF/bootstrap-style checks;
  • statistically, they still look too optimized once DSR/PBO account for the search process.

My current stance is that this makes them research-only, not deployable systems. But I am not sure everyone will draw the line in the same place.

For discussion:

  • If a LETF strategy passes OOS, walk-forward, bootstrap, and rolling-window checks, but fails DSR/PBO, would you still consider trading it with reduced size?
  • Do you treat DSR/PBO as hard blockers, or as warnings that should be balanced against economic intuition and simplicity?
  • Is there a point where a strategy is simple enough, economically plausible enough, or robust enough across regimes that you would accept weak DSR/PBO?
  • For LETFs specifically, do you think trend/momentum/volatility filters are a known structural effect, or just an overfit-prone family because everyone is searching the same indicators?
reddit.com
u/DesertEagleBR — 1 day ago
▲ 98 r/LETFs

40-year LETF rotation backtest — 5 strategy families, 426 configs, here's the full result

Methodology. 5 strategy families × 426 configurations × 40 years of data (1986-2026, plus synthetic backfill to 1969 for cross-validation across 4 independent datasets). Pre-registered with 7 statistical gates: PBO, DSR, walk-forward, single-block OOS, forward-stress, bootstrap 99.9% CI, and cross-library CAGR delta. Anti-overfit margin set upfront — a challenger must clear the incumbent's Sortino by +0.05, not just match it. Tier-by-tier elimination, no peeking.

Winner. Vote-K=2 regime filter on QLD (2× NDX). When at least 2 of these 4 signals are TRUE on the QQQ underlying — price > SMA(250), price > SMA(100), realized_vol(21d) < 40%, AR(1) coefficient over 30d > 0 — go 100% QLD; else go 100% ZROZ (25y zero-coupon Treasury). T+1 month-end execution.

Result on lh_56y, net of realistic annual-netting tax:

  • Sortino 1.18 (edge vs SPY +0.226), Sharpe 0.83 as secondary context
  • Net CAGR ~24% vs SPY ~11%; $10K → ~$60M over 40y vs SPY ~$793K (≈ 75× SPY end equity)
  • Beats SPY in 99.86% of 40-year days and 100% of all 10y / 15y / 20y rolling windows (zero exceptions across 909 windows)
  • Clears every pre-registered gate with margin: DSR p=0.04, PBO=0.18, walk-forward 7/8
  • Composite rolling-window robustness rank #5 of 21; SPY ranks #21 of 21
  • Edge survives even under per-swing worst-case tax: +0.127 Sortino — still above the +0.10 deploy bar

The rest of this post walks tier-by-tier through what won, what got killed, and the defensible-deploy case.

Disclosure first: my own allocation is 100% passive buy-and-hold today. This is research output, not advice.

https://preview.redd.it/v3eoxl1af00h1.png?width=787&format=png&auto=webp&s=c19d11f8d65973ffac949d888e637d7ab118c38f

Top 21 configs across all 5 tiers + SPY benchmark, lh_56y window, log scale. The bands of green/orange in the top half are LETF rotation winners; SPY (black) sits at the bottom.

Tier 1 — Single-LETF rotation

Question: does a simple SMA200 regime filter on a 2× or 3× LETF, with Treasury as the off-state, beat SPY 1× buy-and-hold?

Universe tested: 6 LETFs (QLD, UPRO, SOXL, SSO, TQQQ, UGL) × 6 off-state assets (TLT, EDV, IEF, ZROZ, BIL, GLD) × multiple SMA periods. 382 configurations across 4 sub-phases.

Tier 1 winner: qld_sma200_off_zroz — when QQQ is above its 200-day SMA, hold QLD (2× NDX); else hold ZROZ (25y zero-coupon Treasury).

Metric Value
Sortino (lh_56y gross) ~1.07
Sharpe (secondary) 0.752
% days strategy > SPY 99.83%
End ratio vs SPY (40y) 60.5×
Max drawdown -75% (2000 dotcom)

https://preview.redd.it/dov6jvpaf00h1.png?width=871&format=png&auto=webp&s=cc0dba1ce5213bd18208811478437dc786db9d25

https://preview.redd.it/1oiz1h4bf00h1.png?width=872&format=png&auto=webp&s=edc61b209a4b9bd5be2c331f67cbed7f5e35b3ef

T1 strategy family: all tested configs are shown; top configs are colored, SPY is black, and the remaining configs are faded gray. The second chart is the same universe as strategy equity / SPY equity; the dashed black horizontal line is SPY (=1.0). The winner spends 99.83% of days above SPY.

Verdict: PASSED — first config in study to clear SPY+0.05 anti-overfit threshold.

Why ZROZ over alternatives: I tested all 6 off-state candidates against all on-state LETFs. ZROZ won every single combination on net-of-cost basis. 25y zero-coupon Treasuries provide convexity in flight-to-quality regimes (1987, 2000, 2008, 2020) that shorter-duration alternatives don't. The 2022 rates crisis hurt ZROZ but hurt every other bond proxy worse.

Why QLD over TQQQ/UPRO: QLD (2× NDX) preferred over TQQQ (3× NDX) on a risk-adjusted basis. The 3× LETF has higher absolute returns but the leverage decay in high-vol regimes (-95% drawdown in dotcom) kills enough compounding that the 2× wins on Sortino.

Methodology note — why max drawdown is the wrong filter for LETFs

Most backtest reports treat max drawdown as a hard quality gate: "Strategy X has -75% MDD → reject." That logic was calibrated for stock-picking strategies in the 1× equity world. For LETFs it's misleading.

Asset-class arithmetic. A 2× LETF tracking an index that drew down 50% will draw down ~75-80% mechanically (leverage compounding decay during high-vol periods). A 3× LETF on the same drop will draw down ~85-95%. The 2008 GFC produced a 50%+ drop in SPX/NDX, so any 2× LETF strategy — no matter how good — will have a backtested MDD ≥ 75% that includes 2008 in its history. This is not a strategy quality signal; it's just how leverage works during major bear markets.

The right question isn't "how much did I lose from peak?" — it's: at every point in time, including during the deepest drawdown, was my strategy equity above what SPY 1× buy-and-hold would have given me?

If yes, the strategy is genuinely better than the benchmark — even with a 75% drawdown. If no, the strategy is actually worse, no matter how shallow the drawdown.

This is why the T1 image above plots strategy_eq / SPY_eq (renormalized to start at 1.0) instead of conventional drawdown. The T1 winner spends 99.83% of days above SPY. At its worst absolute MDD bottom (Sep 2000, -75%), strategy was still 3.1× SPY. The same SPY that anyone using "MDD > 50% = reject" filtering would have been holding instead. The reject filter would have you holding a worse alternative, not a safer one.

The study's scoring system (criterion 2) was rebuilt around pct_time_above_benchmark ≥ 95% as the strict bar, with min_relative_equity as a secondary check. Max drawdown is preserved in tables for transparency but is treated as warning-only, not gating.

Tier 2 — HFEA-style stacking

Question: does always-on leveraged stacking (60/40 UPRO+TMF or variants) outperform the rotation strategy from T1?

Universe tested: 11 configs across 6 sub-phases — classic HFEA (UPRO+TMF), weight sweeps, NDX variants (TQQQ+TMF), trinity baskets (UPRO+TMF+UGL), no-decay bond alternatives.

Tier 2 winner: hfea_ndx_tqqq_tmf_55_45, Sortino ~0.92 (Sharpe 0.653 secondary), score 51 MARGINAL.

Verdict: KILL T1→T2 FIRES — fails the anti-overfit threshold. Stacking does not beat rotation in this universe.

https://preview.redd.it/x1kaipxbf00h1.png?width=871&format=png&auto=webp&s=6a7c51285420cc28afebf864b45d5ed5367982ef

https://preview.redd.it/9i79lrecf00h1.png?width=871&format=png&auto=webp&s=0860671bf9aa7d8b5b23cc39cdf9839a04f5f97f

T2 HFEA-style basket family: all tested configs are shown; top configs are colored, SPY is black, and the remaining configs are faded gray. The relative chart shows why the T2-best compounds less than T1 despite a shallower drawdown profile.

Why stacking fails: the always-on TMF allocation acts as drag during equity rallies (the 2010-2020 decade saw TMF underperform cash), while ZROZ-as-rotation only carries duration cost during the actual risk-off windows. ZROZ has positional/temporal alpha, not carry alpha. HFEA basket spends only 59% of days above SPY (vs T1c's 99.83%).

Tier 3 — Composite signal (study winner)

Question: does aggregating multiple regime signals via a Vote-of-K filter beat the single SMA200 from T1?

Universe tested: 31 configs across 5 sub-phases — SMA + vol gate, VIX-managed, SMA + AR(1), Vote-of-K (the winning family), HMM regime classifier, plus iter 022 extended grid (12 variants) and iter 023 multi-asset (12 variants).

Tier 3 winner: qld_voteK2_sma250_100_vol21_40_ar30_off_zroz. The strategy is ON when at least 2 of 4 signals are TRUE on the QQQ underlying:

  1. price > SMA(250) with 5% buffer (whipsaw filter)
  2. price > SMA(100) with 5% buffer
  3. realized_vol(21d) < 40%
  4. AR(1) coefficient over 30d > 0 (positive momentum persistence)

When K≥2 → 100% QLD; else 100% ZROZ. T+1 execution at month-end.

Metric Value
Sortino (lh_56y gross) 1.325
Sharpe (secondary) 0.919
Sortino edge vs SPY +0.367
Sharpe edge vs SPY (secondary) +0.237
% days strategy > SPY 99.86%
End ratio vs SPY (40y) 256×
All 7 statistical gates PASSED

https://preview.redd.it/bvwg6kndf00h1.png?width=871&format=png&auto=webp&s=aa8503c617241f0dffd8c419fd05aaf90a170402

https://preview.redd.it/9kibgscef00h1.png?width=871&format=png&auto=webp&s=43f71132636d1479a984e6d47a7781b7518bb98d

All 31 T3 configs. The first chart shows normalized performance for the full family; the second chart shows the same curves as strategy/SPY equity ratio. The bold lines are the top candidates; the rest are faded. The winner sits in the highest-compounding band. The HMM regime classifier failed catastrophically (-98.7% MDD); a few K=3/K=4 strict subsets ended below SPY because requiring more signals reduces ON-time too aggressively.

Verdict: ADVANCES — first family in study to clear the inter-tier anti-overfit threshold. The sma250/100 variant is the operative Sortino winner (1.325), with Sharpe 0.919 retained only as secondary context. All 7 gates passed.

Why Vote-K=2 over single SMA: the 4 signals capture different regime characteristics (long-trend, short-trend, vol regime, momentum persistence). Requiring K≥2 means single-indicator failure modes don't take you out of the market. The 5% SMA buffer reduces whipsaw trade count by ~30% — important for net-of-tax returns.

Tier 4 — Cross-sectional ranking

Question: does a momentum-based selector across multiple LETFs beat the single-asset T3 winner?

Universe tested: 4 configs — Clenow top-2/top-3, EWMAC top-2, Clenow with vol-gate filter on a 4-LETF pool (QLD, TQQQ, UPRO, SSO).

Tier 4 winner: xs_clenow_top3_zroz_spysma200, Sharpe 0.823 secondary; did not clear the Sortino-first incumbent.

Verdict: KILL T3→T4 FIRES — cross-sectional ranking adds turnover cost without enough alpha differentiation in a small LETF pool.

https://preview.redd.it/h1edv9bgf00h1.png?width=871&format=png&auto=webp&s=18370959f2e8425f9c806c905d9604163003c542

https://preview.redd.it/1lts3otgf00h1.png?width=871&format=png&auto=webp&s=aef4573ef9fbcc82951c778c666b1e4554cc4ad1

T4 cross-sectional rotation family: the full tested set is shown in both performance and strategy/SPY form. None clear the 0.903 anti-overfit threshold; T3d K=2 incumbent remains dominant.

Why cross-sectional ranking fails: small LETF pools (4 candidates) don't give the ranker enough cross-sectional dispersion to add value over a regime filter. Designed-for-futures Clenow ranking expects 10+ uncorrelated instruments; running it on QLD/TQQQ/UPRO/SSO (all NDX/SPX-correlated) is degenerate. The strategy compounds to 26× SPY end ratio vs T3d's 256× — competent but undifferentiated.

Tier 5 — Vol-targeting

Question: does Carver-style continuous position sizing (inverse-vol weighted) beat the binary T3 signal?

Universe tested: 22 configs — original single-asset and multi-asset vol-targeting, plus post-close expansion across sigma-target sweep, carry forecast, IDM/pool grid, and HRP/ERC weighting.

Tier 5 winner: erc_multi4_sigma030, Sortino 1.1399 (Sharpe 0.799 secondary).

Verdict: KILL T5-expansion FIRES — best expanded T5 Sortino 1.1399 is below the incumbent threshold 1.272. Continuous vol-targeting, carry forecast, and HRP/ERC weighting under-allocate during clear LETF uptrends and end below the binary signal's compounding pace.

https://preview.redd.it/ushjtwjhf00h1.png?width=871&format=png&auto=webp&s=9115e6c83737663959d4aea79289110784076810

https://preview.redd.it/tyo7w80if00h1.png?width=871&format=png&auto=webp&s=c8af753cd43eb64afe992942511a2a430d6c0157

T5 vol-targeting family: the full expanded set is shown in both performance and strategy/SPY form. The best T5 config improves the original T5 result but still fails the Sortino threshold and ends far below the T3 winner.

Why vol-targeting fails here: the Carver framework was designed for futures portfolios with 10+ uncorrelated instruments where continuous sizing dominates binary on/off rules. With a small LETF pool (4 instruments, all equity-correlated), the volatility signal under-allocates exactly when you most want exposure (clear bull trends with rising vol) and over-allocates exactly when you don't (post-crash low-vol bounces). The expanded tests confirm this across sigma targets, carry, IDM grids, and HRP/ERC weighting. Binary Vote-K=2 wins decisively on this universe.

Final winner — performance summary

The study winner across all 5 tiers: qld_voteK2_sma250_100_vol21_40_ar30_off_zroz (T3d Vote-K=2 with longer SMA windows).

Net-of-tax performance under annual netting (the realistic regime):

Track Sortino Sharpe Edge vs SPY (Sortino)
Gross 1.325 0.919 +0.367
Net M2 (annual netting 15%) 1.183 0.827 +0.226
Net M1 (per-swing 15%) 1.084 0.766 +0.127

40-year compounding: $10K → ~$60M (M2 net) vs SPY ~$793K. Net-of-tax CAGR ≈ 24% (M2) vs SPY ≈ 11%.

https://preview.redd.it/rl04dwsjf00h1.png?width=1440&format=png&auto=webp&s=5c62bcff6f01c3bad5208417fc2446c834ff6d12

All top configs scatter-plotted by their Sharpe (x) and Sortino (y) on lh_56y gross. The y=x diagonal is dashed; points above the line are penalized more by Sharpe than by Sortino — exactly the LETF asymmetric-upside signature. The winner sits in the upper-right cluster, well above both reference lines.

Cohort behavior — what happens if you enter at the worst possible time

8 historical entry dates were tested (5 worst-case peaks + 3 control troughs):

Entry date Event Strategy 5y CAGR SPY 5y CAGR
1987-08-25 Black Monday peak +24.7% +7.6%
2000-03-24 NDX dotcom peak -1.6% -3.7%
2007-10-09 GFC peak +16.7% +0.6%
2020-02-19 COVID peak +21.6% +14.5%
2021-12-27 Pre-rates peak +8.2% +11.3%
2003-03-11 Dotcom trough +9.3% +12.6%
2009-03-09 GFC trough +33.5% +25.3%
2022-10-12 Rates trough +41.9% +23.4%

The 2000 dotcom peak is the only cohort with a negative 5y CAGR — and it's still less negative than SPY-from-same-date. Every other entry produces a positive 5-year outcome. Strategy beats SPY in 6 of 8 cases.

By regime: when entering during high-confidence ON state (all 4 signals positive), strategy beats SPY 95% of the time over forward 5y windows. Even when entering during OFF state (defensive ZROZ position), beats SPY 96-98% of the time.

Rolling windows — consistency across entry dates

The 8-cohort table above tests handpicked dates. The structural question is broader: across every possible entry month, how does the strategy hold up?

Method: for the top-20 strategies + SPY benchmark, recompute the equity curve, then for each rolling window size (3y, 5y, 10y, 15y, 20y), slide the window forward one month at a time and recompute risk-adjusted return diagnostics / CAGR / MDD / pct_time_above_SPY at each start date.

Total: 37,359 window-level backtests across 21 configs × 5 window sizes.

https://preview.redd.it/8otq433nf00h1.png?width=631&format=png&auto=webp&s=b7bf5e460d1b0ad130b4f1a14b6e0ec37f38a734

Median rolling Sharpe per (config × window size), kept as a legacy robustness diagnostic. Rows sorted by mean across all sizes. Green = robust (consistent edge across entry dates); red = era-dependent (works in some windows, fails in others). T3d K=2 family dominates the green band; SPY 1× sits in the middle row with median 0.678; the red rows at the bottom are configs that worked in single-shot lh_56y but fall apart on rolling.

Headline numbers for the study winner (qld_voteK2_sma250_100_vol21_40_ar30_off_zroz):

Metric Value
Avg median rolling Sharpe (all window sizes) 0.829 (highest of 21)
Avg minimum rolling Sharpe 0.167
Avg pct of windows beating SPY 89.6% (highest of 21)
Composite robustness rank #5 of 21

For comparison, SPY 1× buy-and-hold benchmark: avg median Sharpe 0.678, avg minimum -0.048, composite rank #21 of 21 (every other strategy in the top-20 dominates SPY on rolling robustness).

https://preview.redd.it/7mb2gk0qf00h1.png?width=878&format=png&auto=webp&s=fa52377b1a96e3a7b4c0faded8f2a0c73c9d5f5b

Top 21 by composite robustness rank (legacy Sharpe diagnostics + pct above SPY, computed across all 5 window sizes). Captures both "good when good" and "not-bad when bad". The top-5 cluster is all T3d K=2 family.

https://preview.redd.it/5np2g5jqf00h1.png?width=862&format=png&auto=webp&s=36220c7c612d2bf44af28c7bd7f739faccff2d71

Worst rolling-window Sharpe achievable per config. Red = strategy lost money in some 3-20y window; orange = sub-0.3 worst case; green = ≥0.3 worst case (strategy maintained at least modest edge in worst regime). The T3d K=2 winner sits in the green band.

Translation: if you'd entered the strategy at any month-end between 1986-2026 and held for 5 years, you'd have beaten SPY ~90% of the time. Not "in the backtest period" — in every 5-year sub-window of the backtest period. The strategy isn't path-dependent on a lucky entry date; it works across the calendar.

What about the windows that didn't beat SPY?

The 10% headline obscures an important detail: win rate scales sharply with horizon.

Window Beat SPY (end-equity ratio > 1) "Lose" windows
3y 88.6% 51 windows
5y 96.5% 15 windows
10y 100.0% 0 windows
15y 100.0% 0 windows
20y 100.0% 0 windows

Every 10y+ rolling window beats SPY. Zero exceptions across 909 windows. The losses are entirely in 3y and 5y horizons — and even there, the magnitude is modest, not catastrophic.

Distribution of the 66 losing windows by their end-equity ratio vs SPY:

Quantile end_ratio (strategy_eq / SPY_eq at window end)
Worst (min) 0.738 (= -26.2% vs SPY)
25th 0.829 (= -17.1%)
Median 0.900 (= -10.0%)
75th 0.959 (= -4.1%)
Best (max) 0.999 (= -0.1%)

Median losing window: strategy ended at 90% of SPY equity — i.e., behind by only 10pp over 3-5 years, then catches up in subsequent windows. The worst single window was -26% relative (a 3-year cohort starting 2003 dotcom-trough recovery, where strategy was defensively in ZROZ during a 50% SPX rally).

Pattern of losing windows by entry year:

Era # losing windows What happened
1987-1988 5 Post-Black-Monday bull recovery; strategy in ZROZ caught the early rebound late
2003-2005 24 Post-dotcom bull recovery; strategy defensively in ZROZ during 50%+ SPX rebound
2008 1 Single 3y window, modest miss
2017-2019 12 Late-cycle low-vol bull; strategy occasionally went OFF on weak signals while SPY ran
2020-2021 22 Post-COVID rally + pre-2022 peak; strategy in ZROZ while SPY rallied 60%+

Notable: zero losing windows starting in 2000-2002 (dotcom bear) or 2007-2008 (GFC). Those are the windows where the strategy's defensive ZROZ allocation actually saved capital while SPY collapsed — strategy dominated SPY in those cohorts, exactly as designed.

Failure mode summary: the strategy underperforms SPY in bull-market recoveries from drawdowns, where holding ZROZ misses the early bounce — by 10-20pp over 3-5 years. It does NOT underperform in deep bear markets, where the regime filter does its job. This is the trade-off the strategy makes: smaller upside during sharp recoveries, much smaller downside during crashes. Net of 40 years: 256× SPY end ratio.

Tax models — how tax law affects the top-10

A 30%+ CAGR backtest is meaningless if the tax regime takes most of it back. The study modeled two interpretations of tax law on offshore investments:

Model 1 — per-swing 15% (worst-case) Every profitable exit pays 15% tax immediately. Losses do NOT offset gains across trades. This is the most punitive interpretation, equivalent to treating each round-trip as an independent taxable event.

Model 2 — annual netting 15% (realistic regime) Annual gains − losses consolidated at year-end; loss carry-forward indefinite. Intra-year losses offset intra-year gains; unused losses roll forward without expiration.

The two models are dramatically different in their effect on rotation strategies:

Tax model Survivors (Sortino edge vs SPY > 0, lh_56y) Avg CAGR drag
Gross (no tax) 10 of 10
M2 (annual netting) 10 of 10 ~3.7pp/yr
M1 (per-swing) 5 of 10 ~7.2pp/yr

M1 is roughly 2× the drag of M2 — and that's the difference between all top-10 strategies surviving and half of them dying.

Top-10 by tax track (Sortino edge vs SPY, lh_56y)

Strategy Gross M2 (annual) M1 (per-swing)
sma250_100_..._off_zroz (winner) +0.367 +0.226 +0.127
voteK2_..._sma200_50_vol42_40_off_zroz +0.253 +0.127 +0.011 ✅
voteK2_..._sma200_50_vol21_40_off_zroz +0.264 +0.136 +0.009 ✅
voteK2_off_zroz_alt +0.264 +0.136 +0.009 ✅
vote_k2_off_zroz (canonical) +0.264 +0.136 +0.009 ✅
voteK2_..._sma200_50_vol21_30_off_zroz +0.252 +0.127 -0.030 ❌
tqqq_voteK2_off_zroz +0.201 +0.103 -0.045 ❌
voteK2_off_edv +0.165 +0.047 -0.067 ❌
voteK2_off_tlt +0.165 +0.047 -0.067 ❌
voteK2_off_ief +0.145 +0.028 -0.077 ❌

✅ = beats SPY under M1; ❌ = falls below SPY under M1.

Two patterns jump out

1. ZROZ as off-state survives M1; alternatives don't. Every non-ZROZ off-state variant (EDV, TLT, IEF) dies under per-swing tax. Same for the 3× LETF (TQQQ instead of QLD). The reason: non-ZROZ defensive assets generate more frequent rebalancing trades (worse trade timing, more whipsaws), and 3× LETFs have more leverage decay events that trigger taxable rebalances. ZROZ's lower-frequency trade pattern combined with its convexity in flight-to-quality regimes minimizes M1 friction. This is a second independent argument for ZROZ-as-universal-off (the first was raw performance in T1).

2. The winner sma250/100 has the largest M1 edge by 4× margin. Under per-swing tax, most surviving strategies are at boundary (+0.009 Sortino edge). Only the new winner clears the deploy threshold of +0.10 with the comfortable margin of +0.127. Its longer SMA windows (250/100 vs 200/50) mean fewer signal flips per year → fewer taxable swings → less M1 drag.

Practical implication: under conservative interpretation (per-swing, no offset), only 5 of the top-10 strategies are deployable, and the winner is one of two that clear with comfortable margin. Under annual netting, all 10 survive and the deploy bar relaxes considerably. Tax law interpretation has more impact on which strategies survive than the choice of risk-adjusted metric.

M1 is implemented as FIFO per-asset cost-basis accounting; M2 is annual realize with explicit carry-forward state. The full tax comparison runs 10 strategies × 4 datasets × 3 tax models = 120 result rows.

Why deploying this is defensible

The strategy clears every pre-registered statistical gate with margin, not at the boundary, and the residual risks are quantified rather than hand-waved.

Statistical gates passed (all defined upfront):

  • Walk-forward: 7 of 8 sub-windows positive (threshold ≥ 6/8)
  • DSR (deflated Sharpe ratio) p-value: 0.04 (threshold < 0.05)
  • PBO (probability of backtest overfitting): 0.18 (threshold < 0.5)
  • Bootstrap 99.9% CI lower bound on the primary edge > 0
  • Cross-library CAGR delta < 3pp/yr (custom dispatcher vs vectorbt — engine validated)
  • Single-block out-of-sample: positive
  • Forward-stress: positive
  • Anti-overfit margin: challenger Sortino must clear incumbent + 0.05; winner clears the operative Sortino threshold

Out-of-sample resilience that doesn't depend on lucky entry dates:

  • 99.86% of 40-year days strategy ≥ SPY equity curve
  • 100% of 10y, 15y, 20y rolling windows beat SPY — zero exceptions across 909 windows
  • 5y rolling: 96.5% beat SPY; worst 5y window ended at 90% of SPY (median losing window: -10pp over the period, then recovers in subsequent windows)
  • Composite robustness rank #5 of 21 across rolling-window stress; SPY ranks #21 of 21
  • 8/8 cohort entry dates including dotcom-peak: only one negative 5y CAGR (-1.6%, still less negative than SPY's -3.7% from the same date)

Tax-net edge clears deploy threshold under both regimes:

  • Annual netting (realistic): +0.226 Sortino edge, ~24% net CAGR vs SPY ~11%
  • Per-swing (worst-case): +0.127 Sortino edge — still above the +0.10 deploy bar by margin
  • Winner has 4× the M1 margin of the next surviving config; not a knife-edge result

Risks that are accepted explicitly, not ignored:

  • 75% MDD floor is asset-class arithmetic for 2× LETFs through dotcom + GFC, not a quality signal. The relevant comparison (strategy_eq / SPY_eq) shows strategy was 3.1× SPY at the worst MDD bottom — i.e., the alternative ("stay in SPY to avoid the drawdown") was strictly worse at that exact moment.
  • 2022 rates dual-drawdown is a structural risk for any LETF + long-duration-Treasury rotation thesis. ZROZ went -30%, but the strategy still ended the rolling window above SPY net.
  • Strategy underperforms SPY by 10-26pp in some 3-5y bull-recovery cohorts (defensively in ZROZ during sharp rebounds). This is the explicit trade-off for crash-cohort outperformance (+10-25 CAGR pp/yr in 2000/2008/2020 entries).
  • Synthetic pre-2006 LETF reconstruction calibrated against real QLD 2010-2026 (≤ 1% tracking error) — accepted as a known modeling assumption.

Net of 40 years and realistic tax: $10K → ~$60M (M2 net) vs SPY ~$793K — roughly 75× SPY end equity.

For skeptics — methodology

Pre-registration. Every metric, gate threshold, and anti-overfit margin was committed before the first backtest ran. Challenger configs must clear the incumbent's Sortino by +0.05, not just match it. Sharpe is retained as secondary context and for DSR.

The 7 statistical gates (all defined upfront, all required for advancement):

  1. PBO (probability of backtest overfitting) < 0.5 — combinatorially-symmetric cross-validation against the rest of the config universe
  2. DSR (deflated Sharpe ratio) p-value < 0.05 — accounts for multiple-testing inflation
  3. Walk-forward: ≥ 6 of 8 expanding-window sub-tests positive
  4. Single-block out-of-sample: positive primary metric on holdout
  5. Forward-stress: positive on synthetic perturbation paths
  6. Bootstrap 99.9% CI lower bound on primary edge > 0
  7. Cross-library CAGR delta ≤ 3pp/yr — independent re-implementation in vectorbt vs the custom dispatcher; if the two engines disagree by more than 3pp, the result is rejected as engine artifact rather than alpha

4 independent datasets cross-validate every result: lh_56y (1969-2026 with synthetic backfill), modern_1990 (real data only, no synthetic), spy_real, ndx_real.

Cohort + rolling-window robustness:

  • 8 hand-picked entry dates (5 worst-case peaks + 3 control troughs)
  • 37,359 rolling-window backtests across top-21 configs × 5 horizons (3y/5y/10y/15y/20y) × every month-end as start date
  • Per (config × window size): legacy rolling Sharpe diagnostics, percent of windows beating SPY, composite robustness rank

Sortino-vs-Sharpe re-scoring. All top configs evaluated under both. LETF returns have asymmetric upside: big positive bursts are the point of using leverage in risk-on regimes. Sharpe penalizes positive and negative volatility equally, so it can understate the quality of strategies whose volatility is mostly upside. Sortino penalizes downside semideviation, so Sortino edge is treated as the primary deploy signal.

Tax modeling. 10 strategies × 4 datasets × 3 tax models (gross / annual netting / per-swing) = 120 result rows. M1 (per-swing) is FIFO per-asset cost-basis accounting. M2 (annual netting) is annual realize with explicit carry-forward state, no expiry on losses.

Synthetic LETF reconstruction. Pre-2006 LETF series are rebuilt from underlying index + financing rate + expense ratio, calibrated against real QLD 2010-2026. Tracking error ≤ 1% over the calibration window. Pre-2006 results are not claimed as identical to real-LETF behavior; they are stress paths consistent with the asset class.

Threshold sweeps. SMA buffer (0% / 2.5% / 5% / 7.5%) and hysteresis variants tested to confirm the winner is not a knife-edge optimum on any single tunable parameter.

Reproducibility. Full code, data manifests, gate scripts, and result tables are open-source.

Disclaimer

Research artifact. Not investment advice. LETFs (2×/3×) carry structural leverage decay risk during high-vol regimes. The 40-year backtest includes 2000 dotcom (NDX -83%), 2008 GFC (SPX -56%), and 2022 rates dual-drawdown — all real history, no fitted scenarios. Past performance ≠ future results. The 2022 rates regime is unresolved structural risk for any LETF + Treasury rotation thesis. My own capital is 100% passive buy-and-hold across this entire research process.

reddit.com
u/noletovictor — 6 days ago
▲ 9 r/LETFs

Looking at the current top 100 YTD ETF returns and basically the entire list is 2x/3x single-stock LETFs — BEX, BEG, INTW, MUU, KORU, MULL, SOXL, MVLL, etc., several of them up 200–600%+ YTD (and BWET sitting at a clean 1000% just to make the list weirder).

Curious if anyone here is actually running these names instead of — or alongside — the usual SOXL/TQQQ/UPRO core. If so:

  • How are you sizing positions?
  • Entry/exit logic — momentum rotation, breakout, discretionary, something rules-based?
  • Treating them as core, satellite, or pure trade vehicles?
  • How are you handling the volatility decay + idiosyncratic single-name risk combo?

Or is the consensus still that 2x single-stock LETFs are too dangerous to hold outside of a tiny allocation, and the YTD list is basically survivorship bias talking?

I'm looking for put some $$$ into:

  • MU ~ MUU;
  • SMH, SOXL, CHPS;
  • DRAM;
u/noletovictor — 8 days ago
▲ 13 r/LETFs

Quick post about a methodological pivot I made halfway through a backtest study on leveraged ETF rotation strategies. The TL;DR is: looking at absolute drawdown for LETF strategies is misleading. What you actually care about is whether your equity stays above what you would have had with the boring 1× buy-and-hold alternative — the benchmark you're competing against.

Let me show what I mean.

The conventional view

Most backtest reports treat drawdown as a hard quality metric. "Strategy X has -75% maximum drawdown → reject it." This logic comes from:

  1. Risk-of-ruin / leverage limits: a -75% drawdown nukes your account if you used margin
  2. Psychological tolerance: very few people stick with a strategy through -75%
  3. Sharpe ratio mechanics: high MDD is correlated with high vol, which depresses Sharpe

All three are real. But they were calibrated for stock-picking strategies in the 1× equity world. When you switch to 2× or 3× LETFs, you inherit a structural floor on drawdown that has nothing to do with strategy quality:

  • A 2× LETF (QLD, SSO) tracking an index that drew down 50% will draw down ~75-80% (mechanical leverage compounding decay during high-vol periods)
  • A 3× LETF (UPRO, TQQQ) tracking the same 50% index drop will draw down ~85-95%

The 2008 GFC produced a 50%+ drop in SPX/NDX. So any 2× LETF strategy — no matter how good — will have a backtested MDD ≥ 75% that includes 2008 in its history. This is asset-class arithmetic, not strategy quality.

If you reject every strategy with MDD > 50% on principle, you're rejecting the entire LETF universe regardless of whether the strategy adds value. The MDD bar is filtering on the wrong thing.

So what's the right thing to filter on?

The unstated assumption when you reject "MDD > 50%" is: "I would have been safer doing nothing." But for an investor evaluating a LETF strategy, the alternative isn't risk-free cash — it's the benchmark they would otherwise hold. Usually that's SPY 1× buy-and-hold.

So the right question is:

>At every point in time, including during the deepest drawdown, was my strategy equity above what SPY 1× buy-and-hold would have given me?

If yes, the strategy is genuinely better than the alternative — even with a 75% drawdown. If no, the strategy is worse in a meaningful sense — the drawdown isn't just deeper, it actually eats into your relative wealth.

This is the underwater-vs-benchmark view. Plot strategy_eq / SPY_eq (renormalized to start at 1.0) over time. Above 1.0 = strategy ahead of buy-and-hold; below 1.0 = strategy behind buy-and-hold.

A concrete example

Here's a Gayed-style canonical rotation strategy I tested: 100% QLD when QQQ > SMA200, otherwise 100% ZROZ (25y zero-coupon Treasury). Backtest 1986-2026, 40 years. Sharpe 0.752, MDD -75% in September 2000 (dotcom bottom).

By the conventional metric → reject (MDD > 50%).

But here's the underwater-vs-benchmark plot:

https://preview.redd.it/a8msa8pxgkzg1.png?width=871&format=png&auto=webp&s=0a3515530a9db20334f56bd901c6bd629e87f6ac

Strategy/SPY ratio over 40 years (log-scale). Green band = strategy above SPY equity. Red band (very thin, only first ~7 days) = strategy below SPY equity. 99.83% of days the strategy is above SPY. End ratio: 60.5× SPY equity.

At the worst absolute drawdown moment (Sept 2000, -75% MDD), the strategy was still 3.1× SPY equity. Even if you'd entered at the all-time-high right before the dotcom crash, you'd have ended up with 3× more money than holding SPY through it — at the absolute worst point of the strategy's history.

Per-crisis ratio at the bottom of each crisis (strategy_eq / SPY_eq):

Crisis Strategy MDD At-trough ratio vs SPY
1987 Black Monday -55% 1.76×
2000 dotcom (worst MDD) -75% 3.11×
2008 GFC -67% 8.96×
2020 COVID -42% 36.29×
2022 rates -28% 40.65×

The strategy gets MORE-relative-to-SPY over time because it compounds at a higher rate (~13% vs SPY's ~8.5%). Each crisis preserves the ratio better than holding SPY would, even though absolute MDD looks scary.

Going further — the breakthrough strategy

The same study found a stronger configuration: same QLD/ZROZ rotation pair, but with a 4-signal Vote-of-K=2 gate (any 2 of [SMA200, SMA50, vol_21d<40%, AR(1)_30d>0]). Sharpe 0.853, MDD still -75% (LETF-intrinsic 2008 GFC floor doesn't change).

Conventional MDD-filter: also reject. But the underwater plot is even more dramatic:

https://preview.redd.it/193lv5y0hkzg1.png?width=894&format=png&auto=webp&s=7ea33aa17b45bda4255a84a0ef25fe4032b116ce

Same plot for the breakthrough strategy. 100.00% of days above SPY, min ratio post-warmup 1.44×, end ratio 256× ($10k seed → $2.6M strategy vs $80k SPY) over 40 years.

This strategy was never below SPY equity in 40 years (post a 252-day warmup window). Even at peak drawdown, it was 1.44× SPY. To call this strategy "high drawdown" without context misses what's actually happening — every dollar in the strategy was always at least 1.44 dollars vs the dollar you'd have in SPY.

Counter-example — HFEA basket FAILS the underwater test

To make sure this isn't just a "LETFs always win" cope: I also tested HFEA-style baskets (UPRO 55% + TMF 45%, with various weights and rotation gates). 11 configs in total. Best one hit Sharpe 0.653.

Here's the relative-to-SPY plot for the T2 family:

https://preview.redd.it/mvjx0g62hkzg1.png?width=871&format=png&auto=webp&s=e879356aef339be34b0d46d686a93970cc712392

HFEA-style basket (UPRO+TMF). Only 59% of days above SPY. Min ratio post-warmup 0.38× (basket fell to 38% of SPY equity at one point). End ratio 4.5×.

So the HFEA basket family genuinely does spend material time below SPY equity — especially during the 2022 rate collapse when both UPRO AND TMF dropped together (no shelter from the bond sleeve). The underwater-vs-benchmark plot correctly identifies this as a real weakness of the strategy, not just an artifact of LETF leverage.

This is the test working as intended: it doesn't hide that some strategies actually are worse than SPY. It just rejects the false positives where "scary-looking absolute MDD" is hiding genuine outperformance vs the alternative.

Why I think this matters

Two implications for how I evaluate LETF (or any leveraged) strategies:

  1. Drop the MDD-absolute hard threshold for LETFs. Use it as a warning-only diagnostic. If your strategy has 75% MDD but is always above SPY, the 75% number is an asset-class fact, not a strategy failure.
  2. Add an underwater-vs-benchmark strict bar instead. A reasonable bar I've used: "post-warmup, ≥95% of days the strategy must be above SPY equity (renormalized to same start)." This catches genuine weakness (basket structures that spend periods below SPY) while not penalizing leverage-class arithmetic.
  3. Per-crisis test should be relative too. Instead of "did the strategy drop more than X% in crisis Y?", ask "in crisis Y, did the strategy stay above SPY equity (renormalized within the crisis window) for more than half the days?". This generalizes the same logic to crisis-specific evaluation.

The second and third points lifted the score on a real strategy from "fail" to "STRONG" in my study, just by changing the lens — same underlying performance, more honest evaluation.

TL;DR

  • Conventional MDD bars (e.g., reject if MDD > 50%) reject the entire LETF universe based on asset-class arithmetic, not strategy quality
  • The right metric is underwater-vs-benchmark: at every point in time, is the strategy equity above what the buy-and-hold alternative would have given you?
  • Real LETF rotation strategies can have -75% absolute MDD AND still be 1.44× SPY equity at every post-warmup point, ending 256× SPY over 40 years
  • Underwater-vs-benchmark is not a free pass — HFEA-style baskets genuinely fail it (they spend ~40% of days below SPY equity), so the lens is calibrated correctly
  • Per-crisis evaluation should also use relative equity, not absolute drawdown

If you're evaluating a leveraged strategy and your filter rejects on absolute MDD without considering the benchmark alternative, you're probably filtering noise. Switch to the underwater-vs-benchmark view and see what survives.

This was a methodological pivot during a 25-iteration study on LETF rotation strategies. Will post the full study results separately. Drawdown thinking is calibrated for the 1× equity world; once you're leveraged, it stops measuring what you care about.

reddit.com
u/noletovictor — 8 days ago

Tenho um Dell G15 i5 16gb (original 8gb) e RTX 3050 desde nov 2022. Foi um upgrade bem legal do último que tive antes deste, tanto para trabalhar como para jogar também.

No entanto como não deve ser novidade para ninguém aqui 16gb de ram "não é nada" hoje em dia, então estou vendendo este para poder comprar um i7 (ou similar AMD) com 32gb de ram (este não tem como fazer upgrade mais).

Anunciei este na Olx por R$4500 (o notebook está bom, o problema mesmo é só porque trabalho sempre com 80~90% da ram ocupada, e não gosto disso). Se eu conseguir vender por uns R$4000 está bom ainda (comprei por R$4600 na época).

O intuito do post é: qual seria o melhor "custo x benefício" hoje em dia? No trabalho tenho um notebook i7 com 32gb de ram, mas não tem placa de vídeo. Pra trabalhar é bem ok, não trava nunca; Mas aqui para casa/home-office não me decidi se "gasto um pouco mais para ter placa de vídeo e poder jogar de vez em quando" ou descarto essa ídeia de placa de vídeo porque "nem lembro quando foi a última vez que tive tempo de jogar alguma coisa..."

Agradeço qualquer resposta/comentário aqui. Principalmente sugestões de modelos/máquinas.

Valeu!

reddit.com
u/noletovictor — 9 days ago
▲ 1 r/ETFs

I have been researching a long-term ETF portfolio that tries to combine capital-efficient diversification with a small return-seeking satellite.

The base portfolio is a static “B4” stack:

25% NTSX
25% GDE
25% RSST
25% ZROZ

My current more aggressive idea is:

15% NTSX
25% GDE
25% RSST
10% ZROZ
10% AVUV   # small-cap value sleeve
5%  SPMO   # momentum sleeve
5%  FMTM   # faster momentum sleeve
5%  BTC    # Bitcoin satellite

Because AVUV/SPMO/FMTM have limited live history and BTC is the limiting asset anyway, I tested the longer proxy version from 2010 onward. Since FMTM does not have long history, I used MTUMSIM as a proxy for the combined 10% momentum sleeve:

15% NTSX
25% GDE
25% RSST
10% ZROZ
10% VBRSIM   # proxy for small-cap value
10% MTUMSIM  # proxy for SPMO + FMTM momentum sleeve
5%  BTCSIM

Monthly rebalance, dividends reinvested, explicit estimated ETF expense drag. I am not claiming this is tradable history for every sleeve; this is a proxy stress test.

Results: 2010-2026

Portfolio CAGR Max DD Sharpe
SPY 14.71% -33.70% 0.893
B4 base 14.64% -25.84% 1.091
B4 + 5% BTC 23.18% -27.26% 1.472
B4 + BTC + SCV + Momentum 24.40% -29.90% 1.412
ZROZ-only funding variant 25.49% -33.57% 1.351

Images:

Growth of $10k

Drawdown

Rolling CAGR

CAGR vs Max. DD scatter

Main takeaway

The satellite version slightly increases CAGR versus B4 + 5% BTC, but it does not improve risk-adjusted return:

B4 + 5% BTC:                  23.18% CAGR / -27.26% MDD / 1.472 Sharpe
B4 + BTC + SCV + Momentum:    24.40% CAGR / -29.90% MDD / 1.412 Sharpe

So this is not a clean “better portfolio”. It is more like: if I want to push for higher CAGR, I can add SCV/momentum, but I pay with worse drawdown and lower Sharpe.

The most aggressive funding version was to take the whole satellite out of ZROZ. That produced the highest CAGR, but I dislike it structurally because it almost removes the long-duration convexity sleeve:

ZROZ-only funding: 25.49% CAGR / -33.57% MDD / 1.351 Sharpe

My preferred funding version keeps RSST and GDE intact and funds mostly from NTSX + ZROZ:

15% NTSX / 25% GDE / 25% RSST / 10% ZROZ / 10% SCV / 10% Momentum / 5% BTC

No-BTC check

I also tested the factor satellite without BTC over a longer 2000+ window to see whether SCV/momentum improves the B4 core by itself.

Portfolio CAGR Max DD Sharpe
B4 base 12.27% -29.02% 0.881
B4 + SCV/Momentum, NTSX+ZROZ funded 12.68% -37.74% 0.836
B4 + SCV/Momentum, pro-rata funded 11.98% -34.79% 0.844
B4 + SCV/Momentum, ZROZ funded 12.78% -43.34% 0.774
SPY 8.28% -55.20% 0.509

This makes me more cautious. Without BTC, the factor satellite adds a bit of CAGR in some versions, but worsens drawdown and Sharpe. So the portfolio’s strong 2010+ performance is heavily helped by BTC.

Caveats

  • BTC history starts after Bitcoin survived its earliest failure modes.
  • VBRSIM/MTUMSIM are proxies, not AVUV/SPMO/FMTM live history.
  • GDE and RSST also need proxy assumptions before their actual ETF launch dates.
  • This is gross of taxes other than ETF expense drag.
  • Monthly rebalancing is assumed; in taxable accounts I would probably rebalance mostly with contributions.
  • This is not a recommendation. I am trying to stress-test the idea.

Questions for the sub

  1. Would you keep the factor satellite, or just use B4 + BTC?
  2. If using the satellite, would you fund it from NTSX, ZROZ, GDE, or pro-rata?
  3. Is SPMO a reasonable live momentum sleeve, or would you use something else?
  4. Would you replace VBR/AVUV with a different small-cap value ETF?
  5. Is 5% BTC too much, too little, or reasonable in this kind of portfolio?
reddit.com
u/noletovictor — 11 days ago
▲ 39 r/LETFs

TL;DR: Follow-up to my Post 1 where I shared 4 boring static portfolios that beat SPY on both CAGR and Max DD over 1987-2026. The 4 candidates are simple buy-and-hold stacks built on capital-efficient ETFs — NTSX (90% SPY + 60% Treasury futures), GDE (90% SPY + 90% gold futures), RSST (100% SPY + 100% systematic managed futures) — plus a duration sleeve (TMF / ZROZ / TLT). No signals, no regime gates, no rebal whipsaw.

2026-05-02 methodology note before posting: I made one tax correction and one explicit proxy choice. First, for static buy-and-hold/lazy-rebal portfolios I should not apply terminal DARF in the comparison; tax drag belongs in swing/tactical strategies that realize gains through position changes, not in these buy-and-hold stacks. Second, this post intentionally uses the longer-window RSST proxy SPY + KMLM - cash so the main study can run back to 1987. A live RSST tracking check suggests real RSST is closer to SPY + 70% DBMF + 30% KMLM - cash, but DBMFSIM starts in 2000, which would cut the study to 26 years. I prefer the longer backtest for the main post and treat the 70/30 DBMF/KMLM version as a final caveat/sensitivity check. Return stacking rationale: [risk_parity, ch.5, p.10]; diversified managed-futures engine rationale: [ilmanen_expected_returns, ch.19].

This Post 2 is the methodology + the community-critique integration. After Post 1 went up, you gave me 4 specific empirical critiques. I ran each one as a separate iteration:

  • u/perky_python — "your sim ignores rebal cadence + ERs" → monthly rebal + explicit ERs across all configs
  • u/Fun-Sundae4060 + u/no_simpsons — "TQQQ + 200d SMA gives ~10,000%" → 6 G3 regime-gate variants tested
  • u/Grouchy_Release_2321 + u/perky_python — "SPY-base is US-survivorship-bias" → 5 G4 international variants (NTSD / RSSB / VT)
  • u/laurenthu — "re-fit weights on rolling 5y windows" → walk-forward max-Sharpe G8 gate

Plus the full 7-gate anti-overfit battery (PBO / DSR / Walk-Forward / OOS 70-30 / FWD stress / Bootstrap CI / Cross-library) + the new G8 weight-drift gate. Full data, full methodology, what changed, what survived, what didn't.

Headline change from Post 1: top-level rebalance is now monthly (not yearly) and expense ratios are explicit. This shifts the Pareto frontier in interesting ways — most notably, Popular 50/25/25 SSO/GLD/ZROZ loses 10.71pp of MDD (-39.84% → -50.55%) when you rebalance monthly. The capital-efficient stacks (NTSX/GDE/RSST) are virtually immune.

Headline finding: on the longer 1987-2026 window, B4 Conservative (25 NTSX / 25 GDE / 25 RSST-like KMLM trend stack / 25 ZROZ) is still the best balanced pick: high enough CAGR, materially lower drawdown than SPY, and the best Sharpe among the long-window static stacks. L1 CEGB remains the lower-stress alternative; B5/B2/T1 buy more CAGR with materially deeper drawdowns.

What changed since Post 1 (4 community-driven follow-ups)

Critic Critique Test Verdict
u/perky_python "Your sim ignores rebal cadence + ERs. Real CAGR is ~1pp lower." Re-ran with Monthly rebal + explicit ERs (NTSX 0.20%, GDE 0.20%, RSST 0.99%, KMLM 0.92%, GLD 0.40%, ZROZ 0.15%, etc). ⚠️ Partial. CAGR drops 0.5-0.9pp on stacks (less than 1pp). MDD on Popular 50/25/25 worsens -10.71pp (huge finding).
u/Fun-Sundae4060 + u/no_simpsons "Try TQQQ/QQQ regime-gate × diversifiers above/below 200d SMA. ~10,000% return." Tested 6 G3 variants (Fun-Sundae spec, NDX-heavy, with bonds, minimal, Gayed-NDX, pure TQQQ/QQQ swap). ❌ The "~10,000%" return is computed over 2012-2025 (cherry-picked window without dotcom). On 1987-2026 with 2000-2002 included, the regime gate produces much worse drawdowns than B4.
u/Grouchy_Release_2321 + u/perky_python "SPY-only base is US-survivorship-bias. Try VT/RSSB/NTSI." Tested 5 G4 variants with NTSD, RSSB, mixed US/International. ⚠️ In the long-window table, US-bias accounted for only ~4% of B4's Sharpe edge. This is directional because synthetic international/stacked proxies add their own uncertainty. G4d (RSSB-based) breaks the MDD record at -22.56% — the lowest in the broader study.
u/laurenthu "Re-fit weights on rolling 5y windows. If they drift, edge is window-specific." Walk-forward max-Sharpe optimization on B4/B2/T1 universes. ✅ G8 PASS. Weights drift wildly (60-75pp range) BUT static portfolio Sharpe beats walk-forward in all 3 universes. Static = optimal shrinkage estimator (DeMiguel/Garlappi/Uppal 2009 RFS).

Overall: B4 Conservative survives the critique process as the balanced pick on the long-window study. The final caveat is that live RSST may be better approximated by a 70/30 DBMF/KMLM trend sleeve, which starts only in 2000 and produces somewhat different absolute numbers.

Methodology — Monthly rebal + explicit ERs (refresh from Post 1)

Post 1 used rebalance_freq: Yearly with no ER drag. Per perky_python's critique, that's an idealized backtest. Realistic deployment would:

  • Top-level rebal: monthly (matches monthly contribution cadence; what most retail investors actually do).
  • Internal ETF rebal: NTSX/GDE quarterly (5% threshold), RSST daily — not our concern (vendor's responsibility).
  • Real ERs applied as drag on each portfolio: weighted average per portfolio.

ERs used (per issuer prospectus):

ETF ER (% / yr)
NTSX 0.20
GDE 0.20
RSST 0.99
KMLM 0.92
GLD 0.40
ZROZ / TLT / IEF 0.15 each
GOVZ 0.10
SPY 0.0945
SSO (= SPYSIM?L=2&E=0.89) 0.89
UPRO (= SPYSIM?L=3&E=0.91) 0.91
TMF (= TLTSIM?L=3&E=1.05) 1.05

Per-portfolio drag (weighted ER):

  • B4 Conservative: 0.385%
  • B2 Balanced: 0.417%
  • T1 Aggressive: 0.358%
  • L1 Sleeping pills: 0.317%
  • L2 Bogleheads: 0.296%
  • Popular 50/25/25: 0.138% (SSO ER baked into the leveraged SIM)

Updated results — Monthly rebal + ERs, no DARF for buy-and-hold (long 1987-2026 window)

Tax model: no DARF applied for the static buy-and-hold/lazy-rebal portfolios in this section. Taxes should be modeled for swing/tactical strategies that realize gains through position changes, not for static stacks that are accumulated and held.

RSST proxy for this main study: SPY + KMLM - CASHX. This is not the closest live RSST tracking proxy; it is the longer-history managed-futures proxy that lets the study run from 1987-12-30 → 2026-04-29.

Why use KMLM-only here? Because DBMFSIM starts in 2000. A 70/30 DBMF/KMLM trend sleeve probably tracks live RSST better, but it removes 12+ years of useful stress history. For this post I prefer the longer regime sample and explicitly caveat that the absolute RSST-containing results may shift under the shorter 70/30 proxy.

This is the main long-window methodology for the post. The 2000+ 70% DBMF / 30% KMLM version is a tracking-sensitivity check, not the headline table.

Main contenders (Pareto frontier)

portfolio CAGR Max DD Sharpe Sortino Calmar
SPY 1× buy-hold 11.37% -55.20% 0.523 0.740 0.206
🔵 Sleeping pills (L1 CEGB) 11.06% -25.43% 0.729 1.044 0.435
⚪ Bogleheads 67% NTSX (L2) 11.06% -26.30% 0.722 1.037 0.420
🟢 Conservative (B4 ZROZ) 13.31% -28.94% 0.745 1.071 0.460
🟠 T1 gold-heavy 13.34% -34.65% 0.688 0.984 0.385
🔴 B2 TMF10 13.89% -36.38% 0.718 1.028 0.382
B5 no duration 14.22% -41.12% 0.687 0.981 0.346

Tax note: Gayed/LRS strategies still need a separate after-tax model because regime flips realize gains. The static stacks above do not get recurring DARF in this comparison.

Equity curves ($10k start, log scale, long-window proxy 1987-12-30 → 2026-04-29):

Visual ranking by terminal value using the longer-history RSST proxy (SPY + KMLM - cash), monthly rebalance, explicit ERs, and no DARF for static buy-and-hold/lazy-rebal. B5 has the highest terminal value but much higher drawdown; B4 remains the best Sharpe/CAGR/MDD compromise; L1 has the smoothest ride.

Pareto frontier — CAGR vs Max DD (the "interesting zone" is the upper-right quadrant, where CAGR > SPY AND |MaxDD| < SPY):

Green region = beats SPY on both axes. This chart shows the long-window static-stack universe; DBMF-only/blend variants are omitted from the scatter because DBMFSIM starts in 2000 and is not directly comparable. The practical frontier is L1/L2 for lowest stress, B4 for balanced Sharpe/CAGR/MDD, and B5/B2/T1 if you accept progressively larger drawdowns.

Key changes vs Post 1 (after monthly ERs + explicit long-window RSST proxy):

  • B4 remains the highest-Sharpe balanced pick: CAGR 13.31%, MDD -28.94%, Sharpe 0.745 over ~38.3y.
  • L1 CEGB remains the lowest-stress reference: CAGR 11.06%, MDD -25.43%, Sharpe 0.729.
  • B2/T1/B5 still offer more CAGR than B4, but require accepting much deeper drawdowns (34.6-41.1%).
  • Popular 50/25/25's monthly-rebal MDD blowup remains an important caution from the earlier monthly/ER test.

Updated full sweep — long-window static stack table (Monthly + ERs, no DARF)

config family CAGR Max DD Sharpe Notes
Conservative (B4 ZROZ) B/Static 13.31% -28.94% 0.745 balanced pick: best Sharpe, sub-30% MDD
B3 TLT instead of TMF B/Static 12.44% -30.06% 0.735 second-line backup if ZROZ/GOVZ unavailable
Sleeping pills (L1 CEGB) L/Static 11.06% -25.43% 0.729 lowest stress
Bogleheads 67 NTSX (L2) L/Static 11.06% -26.30% 0.722 low-risk reference
B2 TMF10 B/Static 13.89% -36.38% 0.718 high-CAGR alternative
T2 equity-heavy B/Static 13.40% -33.14% 0.707 NTSX 35%
T1 gold-heavy B/Static 13.34% -34.65% 0.688 more CAGR, worse drawdown
B5 no duration B/Static 14.22% -41.12% 0.687 highest CAGR, high MDD
B1 user baseline 25 TMF B/Static 12.93% -38.78% 0.665 original spec — TMF 25% costs MDD
M4 RSST+KMLM blend M/Static 11.85% -37.27% 0.645 dual MF source, still long-window
T3 RSSB global B/Static 12.31% -41.39% 0.623 global stack, MDD inflated
M1 KMLM no RSST M/Static 10.74% -35.92% 0.610 KMLM-only stack
M2 DBMF no RSST M/Static 9.76% -37.97% 0.610 DBMF-only MF source; 2000+ window only
M3 KMLM+DBMF blend M/Static 9.56% -36.94% 0.600 split MF no RSST; 2000+ window only
SPY 1× Benchmark 11.37% -55.20% 0.523 floor

All rows except M2/M3 share the same ~38.3y window. M2/M3 contain DBMF directly and are shown for context only because DBMFSIM starts in 2000. Regime-gated LRS and G3 variants are omitted from this long-window static-stack table because their tax treatment differs and must include realization drag.

What we learned (4 main findings)

1. Monthly rebal hurts SSO-based portfolios; capital-efficient stacks shrug it off

The biggest surprise was Popular 50/25/25 SSO/GLD/ZROZ losing 10.71pp of MDD when switching from yearly to monthly rebal (-39.84% → -50.55%). Why?

In bear markets:

  • SSO 2× falls ~2× the SPY drawdown.
  • Monthly rebal forces re-buying SSO every month at the new (lower) price to maintain 50% target weight.
  • This accelerates the bleed: you keep adding to a falling position.
  • Yearly rebal naturally "lets SSO die" through the year and only restores weight at year-end — accidentally protective.

Capital-efficient stacks (NTSX/GDE/RSST) are virtually immune because they don't have a single 2× LETF that's leveraging-against-trend on its own. Each stack contains multiple asset classes natively (NTSX = SPY + bonds; GDE = SPY + gold; RSST = SPY + managed futures); the internal balance is the diversification.

portfolio ΔMDD (Yearly → Monthly)
Popular 50/25/25 SSO/GLD/ZROZ -10.71pp
Aggressive (T1 gold-heavy) -3.99pp
Bogleheads 67% NTSX (L2) -3.83pp
Sleeping pills (L1 CEGB) -3.16pp
Conservative (B4 ZROZ) -0.29pp
Balanced (B2 high-equity) -0.17pp
Gayed LRS variants (signal-driven) ±0pp

Underwater chart (peak-to-trough drawdown, long-window proxy 1987-12-30 → 2026-04-29):

SPY still hits roughly -55&#37; in the GFC. L1/B4 keep drawdowns materially shallower, while B2/T1/B5 buy more CAGR by accepting deeper stress. The deepest drawdowns cluster around 2000-2002, 2008, and 2022; the 2022 joint stock/bond shock is where duration-heavy sleeves show their main weakness.

Implication: if you're a real investor making monthly contributions (which most are), the popular 50/25/25 SSO mix is strictly dominated by every capital-efficient stack — both on Sharpe AND on realized MDD.

2. The TQQQ + 200d SMA regime-gate doesn't survive 1987-2026 (vs. cherry-picked 2012-2025)

Tested 6 variants (G3a-G3f) per Fun-Sundae4060 + no_simpsons specs:

variant bull bear CAGR MDD Sharpe
G3a Fun-Sundae TQQQ 34 / KMLM 33 / GLD 33 QQQ 34 / KMLM 33 / GLD 33 15.60% -58.53% 0.661
G3b NDX-heavy TQQQ 50 / KMLM 25 / GLD 25 QQQ 50 / KMLM 25 / GLD 25 18.34% -75.98% 0.621
G3c with bonds TQQQ/KMLM/GLD/IEF 25/25/25/25 QQQ/KMLM/GLD/IEF 25/25/25/25 13.36% -42.63% 0.703
G3d minimal TQQQ 50 / KMLM 50 QQQ 50 / KMLM 50 18.58% -75.47% 0.629
G3e Gayed-NDX TQQQ 100 IEF 100 18.61% -90.05% 0.535
G3f pure swap TQQQ 100 QQQ 100 19.97% -96.90% 0.556

Best G3 = G3c (with bonds) with Sharpe 0.703, but with MDD -42.63%. This is a different risk profile from B4's -28.94% MDD, and G3/LRS variants need separate after-tax treatment because they realize gains through regime flips.

The community-cited "~10,000% return on TQQQ + 200d SMA" comes from Bogleheads/Petrou backtests over 2012-2025 — a cherry-picked window without a dotcom-equivalent crash. On 1987-2026 (which includes 2000-2002):

  • TQQQ standalone buy-hold MDD: -99.98% (you lost ~99% of capital in 2002).
  • G3f pure swap (TQQQ → QQQ regime-gate): MDD -96.90%. Gate saved only 3pp of drawdown.
  • G3e Gayed-NDX (TQQQ → IEF): MDD -90.05%. Gate saved 10pp because bonds are uncorrelated to equity in bear.
  • Gayed canonical SSO (SPY 2× → IEF): MDD -43.48%. Works well because SPX vol is much lower than NDX vol.

Lesson: regime-gate works only if the bear sleeve is uncorrelated to the bull sleeve. Pure equity-on-equity swap (G3f) gives basically no cushion. NDX-leveraged is too volatile for the SMA gate to handle gracefully — by the time the 200d signal fires, you've already taken a 25-35% hit, and whipsaws around the boundary compound it.

3. US-bias accounts for only ~4% of B4's Sharpe edge — structural diversification is the real driver

Tested 5 G4 variants per Grouchy_Release_2321 + perky_python:

variant allocation CAGR MDD Sharpe
G4c mixed US/Intl 12.5 NTSX / 12.5 NTSD / 25 GDE / 25 RSST / 25 ZROZ 13.31% -32.65% 0.716
G4a NTSD swap 25 NTSD / 25 GDE / 25 RSST / 25 ZROZ 13.29% -36.24% 0.684
G4d 4-sleeve global 25 RSSB / 25 GDE / 25 ZROZ / 25 KMLM 10.54% -22.56% 0.678
G4b RSSB-heavy 50 RSSB / 25 GDE / 25 KMLM 10.59% -34.35% 0.610
G4e full Intl 50 NTSD / 25 GDE / 25 KMLM 11.57% -48.52% 0.555

Two findings here:

(a) G4c (50/50 US/Intl split) got Sharpe 0.716 in the long-window table — only 0.029 below B4's Sharpe of 0.745. Read this as directional rather than final because RSSB/NTSD synthetic histories add uncertainty, but the structural diversification (capital-efficient stacking via NTSX/GDE/RSST embedding leverage across asset classes) still appears to be the dominant driver.

(b) G4d (RSSB-based 4-sleeve) breaks the MDD record at -22.56% — the lowest of any portfolio in the entire study. Calmar ratio 0.467 (also a record). Trade-off: CAGR 10.54% (below SPY's 11.37%). For a CAGR-flexible investor with strong MDD aversion, G4d is a new top-tier candidate.

Caveat: RSSB only has ~2 years of live data (launched Jan 2024). RSSBSIM extends back to 1987 via simulation. Take G4d with extra skepticism until 5+ years of live track record.

4. Walk-forward weight optimization — drift is real but doesn't translate to better OOS performance (G8 PASS)

Per laurenthu's critique: re-fit max-Sharpe weights on rolling 5y windows; if optimal weights drift wildly, the structural edge is window-specific.

Step 1 — drift magnitude (rolling 5y max-Sharpe, scipy SLSQP, ~400 windows per universe):

universe NTSX range GDE range RSST range ZROZ/TMF range Max drift vs static
B4 0-87pp 0-86pp 0-100pp ZROZ 0-75pp 75pp
B2 0-92pp 0-86pp 0-100pp TMF 0-59pp 70pp
T1 0-92pp 0-86pp 0-100pp TMF 0-59pp 75pp

Optimal weights pick corner solutions (0% or 100% in many windows). On the surface, this looks bad — laurenthu's prediction is right.

Step 2 — what realized OOS performance does this drift produce?

universe CAGR static CAGR walk-fwd MDD static MDD walk-fwd Sharpe static Sharpe walk-fwd ΔSharpe
B4 11.47% 11.92% -26.53% -27.96% 0.940 0.879 -0.061
B2 11.86% 12.22% -31.75% -34.37% 0.886 0.824 -0.062
T1 11.86% 12.22% -36.19% -34.37% 0.853 0.824 -0.029

(Note: these Sharpes use raw mean/vol, not the Rf-adjusted Sharpe testfol.io uses; the absolute values differ from earlier sections but the DELTA between static and walk-forward is fair.)

Static beats walk-forward in all 3 universes despite walk-forward picking up +0.36-0.45pp of CAGR. Walk-forward's gain in returns is more than eaten by the higher MDD it incurs.

Why? Three structural effects:

  1. 5-year window is too short for stable covariance estimation — out-of-sample noise dominates.
  2. Max-Sharpe optimization picks corner solutions (0%/100%), causing high turnover, which costs MDD.
  3. Equal-weight (B4 25/25/25/25) is the shrinkage estimator — known optimum for n_assets ~ 4 with modest correlation (DeMiguel/Garlappi/Uppal 2009 RFS, "Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?").

G8 verdict: PASS. Static weights are not curve-fit. Drift is real; curve-fit is not.

Rolling CAGR consistency (5y / 10y / 15y / 20y windows, named static contenders):

The CE stacks (B2/T1/B4/L1) maintain positive rolling CAGR across virtually all windows &gt;=10y. SPY dips negative in rolling-10y around 2008-2010, while B4/L1 stay materially steadier. The static stacks aren't just better in aggregate — they're more consistent across overlapping windows, which is the practical definition of \"non-curve-fit.\"

Why this works (the mechanism — unchanged from Post 1)

Capital-efficient stacking removes the LETF decay tax

Daily-rebalanced LETFs (SSO 2×, UPRO 3×, TMF 3×) suffer volatility decay — they target the daily multiplier, not the long-term multiplier. In choppy markets (2022, 2000-2003), this can eat 5-10%/year of return.

Capital-efficient ETFs use futures overlays instead of daily-reset leverage:

  • NTSX: 90% S&P 500 + 60% Treasury futures = 1.5× notional, no daily decay
  • GDE: 90% S&P 500 + 90% gold futures = 1.8× notional
  • RSST: 100% S&P 500 + 100% systematic managed futures = 2× notional

You get the leverage; you don't pay the daily-reset decay tax.

Asymmetric diversification — alpha sources that fight different fights

  • 2008 GFC: bonds ✅ rallied, gold ✅ rallied, MF ✅ trend-followed shorts
  • 2020 COVID flash: bonds ✅, gold ✅, MF mixed (too fast)
  • 2022 inflation: gold ✅ flat, MF ✅ +20-30%, bonds ❌ catastrophic
  • 2000-2003 dot-com: value ✅, bonds ✅, gold ✅ slow

No regime kills more than 2 of 4 simultaneously.

How I tested for overfit (the 7-gate battery + new G8)

gate what it tests threshold
G1 PBO (CSCV) Probability of backtest overfit < 0.5
G2 DSR Deflated Sharpe with Bonferroni p < 0.05
G3 Walk-Forward Rolling 8 windows, MDD < 25% per 6+/8 windows
G4 OOS 70/30 Train 70%, test 30% Sharpe > 0 Sharpe > 0
G5 FWD stress Post-2020 OOS Sharpe > 0 Sharpe > 0
G6 Bootstrap CI 99.9% CI low > 0 CI low > 0
G7 Cross-library Same backtest in 2 libs ±3pp ±3pp
G8 Walk-forward weight drift 🆕 Re-fit max-Sharpe rolling 5y; static portfolio Sharpe ≥ walk-forward Sharpe static ≥ WF

Gates pass:

  • ✅ G2 DSR: p < 0.001 across both datasets
  • ✅ G4 OOS 70/30: stable post-2003
  • ✅ G5 FWD: survived COVID + 2022 inflation
  • ✅ G6 Bootstrap: CI low > 0
  • ✅ G7 Cross-lib: ±3pp agreement testfol.io vs internal Python
  • G8 walk-forward: static ≥ walk-forward on ALL universes (this Post 2 addition)
  • ⚠️ G1 PBO: grid-level inflated 0.5-0.9 because configs are similar (Principle M from López de Prado AFML — PBO is grid-composition-dependent)
  • ⚠️ G3 Walk-Forward 25% per-window: fails for any leveraged strategy in 2008/2022 stress — structural, not overfit

Bottom line: edge is real (G2/G4/G5/G6/G8 confirm), per-window stress is structural for leveraged exposure (G3 expected fail), grid noise means don't fine-tune weights (G1 warning).

Honest caveats (updated)

  1. TMF (3× LTT) lost -71% in 2022 alone. At 25% allocation = -17.7pp portfolio drag. Aggressive (B2) reduces to 10% (-7pp drag). Conservative (B4 ZROZ) replaces TMF entirely with zero-coupon Treasuries (-53% in 2022 at 25% = -13pp drag, but no LETF decay).
  2. NTSX/GDE/RSST are recent ETFs. NTSX 2018, GDE 2022, RSST 2022. The main table and charts use synthetic proxies over 1987-2026, including RSST = SPY + KMLM - cash. Real ETFs have execution drag and tracking error not fully captured (see u/perky_python's critique addressed in this post).
  3. RSSB has only ~2 years of live data (launched Jan 2024). G4d (which has the best MDD in the study) is partially synthetic. Take with extra skepticism.
  4. Bear markets in NDX-3× are catastrophic. G3e/G3f confirm: the 200d SMA gate doesn't save you from -90% to -97% MDD on TQQQ-leveraged regime gates in 1987-2026. The "10,000% TQQQ" backtest claims you'll see online are over 2012-2025. Don't extrapolate.
  5. 40-year backtest assumes regimes repeat. 1986-2026 covers 5 major stress events but NOT 1970s stagflation, NOT a Japan-style lost decade. Different decade could differ.
  6. Behavioral risk is real. A 30% drawdown over 18-24 months tests discipline. If you panic-sell at the bottom, you destroy the strategy.
  7. Pre-1987 data limitation: backtest can't extend before KMLM SIM start. u/Fit-Librarian279 correctly pointed out that 1980-1982 was a tough drawdown for both gold and ZROZ (verified: gold -53% peak-to-trough, long-bonds bottomed 1981-82 with negative real returns through 1979). The Hurst/Ooi/Pedersen 2017 "Century of Evidence" extends MF data further back; backfilling is on the roadmap but won't be in this Post 2.
  8. ZROZ/GOVZ are duration bets, not guaranteed crisis hedges. A commenter correctly flagged the forward-looking risk: deficits, rising term premia, sticky inflation, or central-bank reserve shifts could make long STRIPS fail to rally in a future equity crisis. I still prefer ZROZ/GOVZ over TMF because they avoid daily-reset decay and give more convexity than TLT, but this sleeve is not magic insurance.
  9. RSST proxy caveat: the longer-window study uses SPY + KMLM - cash because KMLMSIM lets us start in 1987. A live tracking check suggests actual RSST is closer to SPY + 70% DBMF + 30% KMLM - cash; using that proxy cuts the common window to 2000+ and shifts B4 to roughly CAGR 11.00% / MDD -29.60% / Sharpe 0.671. I use the KMLM-only proxy here to gain 12+ years of regime history, not because it is a perfect RSST reconstruction.

My pick — what I'd actually hold for the next 30 years (UPDATED)

Post 1 had T1 gold-heavy as my pick. Post 2 update: switching to B4 Conservative, with L1 CEGB as the higher-Sharpe low-risk alternative.

candidate CAGR MDD Sharpe 30y verdict
🏆 Conservative (B4 ZROZ) 13.31% -28.94% 0.745 MY PICK: best Sharpe in the long-window static sweep and the cleanest CAGR/MDD compromise.
🔵 Sleeping pills (L1 CEGB) 11.06% -25.43% 0.729 Lowest stress. Give up 2.25pp CAGR vs B4. Pick this if drawdown tolerance is the binding constraint.
B2 TMF10 13.89% -36.38% 0.718 Higher CAGR, much larger drawdown.
T1 gold-heavy 13.34% -34.65% 0.688 Was Post 1 pick; demoted because B4 has similar CAGR with materially lower MDD.
🛡️ G4d (RSSB+GDE+ZROZ+KMLM) 🆕 10.54% -22.56% 0.678 Best MDD in entire study. Lower CAGR (below SPY) is the trade-off. Consider as complement, not substitute, to B4.

Why B4 wins now:

  1. Best balanced trade-off on the long-window study: B4 adds +2.25pp CAGR over L1 while keeping MDD below 30%.
  2. ZROZ instead of TMF removes the LETF decay tax — same duration role, no daily-reset decay. GOVZ is an operationally close substitute for ZROZ because both target long Treasury STRIPS / zero-coupon duration.
  3. Small CAGR/MDD penalty from monthly rebal in the prior cadence test (B4 was much less sensitive than T1 or Popular 50/25/25).
  4. Survives G8 walk-forward gate: static B4 25/25/25/25 beats rolling max-Sharpe optimization on the same universe. Equal-weight is the optimum shrinkage.
  5. Structural diversification holds geographically: G4c (50/50 US/Intl swap) only lost modest Sharpe vs B4 in the prior G4 test. The edge isn't purely a US-equity-premium curve fit.

Single-portfolio commitment for next 30y: 🟢 B4 Conservative

25% NTSX  (90% SPY  + 60% Treasury futures = 1.5x notional)
25% GDE   (90% SPY  + 90% Gold futures      = 1.8x notional)
25% RSST  (100% SPY + 100% Trend            = 2.0x notional)
25% ZROZ  (zero-coupon 25y Treasuries, no LETF decay; GOVZ is a close substitute)
=========
100% capital, ~163% notional exposure, ~74.5% equity beta

Monthly rebalance via contributions only (don't sell unless ±10pp drift). No regime gates, no signals to watch, no whipsaw cost. Boring buy-and-hold.

Validate ZROZ/GOVZ availability at your broker. GOVZ (iShares 25+ Year Treasury STRIPS Bond ETF) is a close operational substitute for ZROZ (PIMCO 25+ Year Zero Coupon U.S. Treasury Index ETF). In quick testfol.io checks their behavior was practically the same because both are long-duration STRIPS/zero-coupon Treasury exposure. If ZROZ is unavailable but GOVZ is available with acceptable spread/liquidity, I would use GOVZ before falling back to TLT. TLT is the second-line fallback because it changes the sleeve more: lower duration/convexity than STRIPS, even if still long Treasury exposure.

Optional MDD-extreme tier: 🛡️ G4d (RSSB + GDE + ZROZ + KMLM)

25% RSSB  (100% global stocks + 100% global bonds via futures = 2.0x notional)
25% GDE
25% ZROZ
25% KMLM  (KFA MLM Index, rules-based managed futures)

CAGR 10.54% (below SPY's 11.37%) but MDD only -22.56% — lowest in the entire study. Caveat: RSSB has ~2y live track record. Use as complement, not core.

Replicate (testfol.io)

Set rebalance "Monthly", invest_dividends=true. Apply ER drag per portfolio (sum of weighted ERs).

Long-window RSST expansion used in the main table:

RSST proxy = 100 SPYSIM + 100 KMLMSIM - 100 CASHX
B4 Conservative = 47.5 SPYSIM + 25 GDESIM + 25 KMLMSIM + 25 ZROZSIM + 15 IEFSIM - 37.5 CASHX  (drag 0.385%)
B2 TMF10        = 57 SPYSIM + 30 GDESIM + 30 KMLMSIM + 18 IEFSIM + 10 TLTSIM?L=3&amp;E=1.05 - 45 CASHX  (drag 0.417%)
T1 gold-heavy   = 43 SPYSIM + 35 GDESIM + 25 KMLMSIM + 20 TLTSIM?L=3&amp;E=1.05 + 12 IEFSIM - 35 CASHX  (drag 0.358%)
G4d MDD-extreme = 25 RSSBSIM + 25 GDESIM + 25 ZROZSIM + 25 KMLMSIM  (drag 0.490%)

The 70/30 DBMF/KMLM tracking-sensitivity version is:

RSST tracking proxy = 100 SPYSIM + 70 DBMFSIM + 30 KMLMSIM - 100 CASHX?E=-2

The capital-efficient SIMs (NTSX/GDE/RSST/RSSB/NTSD) decompose into base equity + leveraged sleeves via CASHX (the -X% leg models the implicit T-bill borrow funding the futures notional).

For LRS strategies, use the testfol.io tactical builder:

Signal: SMA(SPYSIM, 200) &lt; Price(SPYSIM)  tolerance: 2%
  IF TRUE:  100% SPYSIM?L=2 (SSO) or SPYSIM?L=3 (UPRO)
  IF FALSE: 100% IEFSIM
Rebal: Daily. Trading freq: Daily.

What I want from this post

Primary: share the methodology + community-driven feedback integration so others can replicate. Post 1's data was good but Post 2 reflects 4 substantive corrections from the community.

Secondary: get more honest critique. Specifically:

  • Did I miss something in the G4 international tests? I tested NTSD/RSSB/VT-base but didn't run NTSI (US+Intl combined stack) since NTSI isn't a SIM on testfol.io.
  • The G3 NDX regime-gate finding is uncomfortable. TQQQ does very well on cherry-picked windows but breaks on full 1987-2026. Is there a version (with smarter signal, shorter MA, dual-signal like u/no_simpsons suggested) I should test?
  • If you posted a screenshot that beats B4, please share weights or a testfol.io link. One LOWDDPORT screenshot looked excellent (roughly 12.7% CAGR / -26.6% MDD / 0.84 Sharpe), but without allocation details it isn't replicable.
  • G4d (RSSB+GDE+ZROZ+KMLM) is the new MDD record (-22.56%). Anyone running this live? Does the synthetic backfill of RSSB pre-2024 trustworthy?
  • Is there a known issue with the "static beats walk-forward" finding in G8? This is documented in DeMiguel et al. 2009 RFS for 1/N portfolios but I want to make sure nothing's wrong with my SLSQP implementation.

What I would NOT find useful: "just hold VTI bro" / "leverage is gambling" / "this won't work". I've heard those. Specific empirical critiques only.

Happy to share the spec JSONs, full per-config metrics tables, and the Python pipeline if anyone wants to replicate.

Anyone holding something not in my expanded sweep that lands in the upper-right quadrant of the long-window Pareto frontier (CAGR > B4 13.31% AND |MaxDD| < B4 28.94%)? Or something with MDD < G4d 22.56% AND CAGR > G4d 10.54%?

reddit.com
u/noletovictor — 12 days ago
▲ 23 r/LETFs

TL;DR: Convince me there's something better. CAGR > SPY's 11.48%, Max DD < SPY's -55.14%, Sharpe > 0.78. Holding period 30 years. What do you got?

I've been backtesting capital-efficient stacks for the last few weeks. Best 4 candidates I found, vs the obvious benchmarks (SPY, SSO/UPRO buy-hold, Gayed LRS, popular 50/25/25 SSO/GLD/ZROZ). All on testfol.io, common start 1987-12-31 → 2026-04-30, annual rebal, dividends reinvested, full SIM proxies for newer ETFs.

My question to the sub: anyone running something that beats my top 4 on both axes (higher CAGR AND lower Max DD than SPY)? I'd love to be proven wrong.

The 9 portfolios

# name weights
1 Aggressive (B2) 30% NTSX + 30% GDE + 30% RSST + 10% TMF
2 Balanced (T1) 20% NTSX + 35% GDE + 25% RSST + 20% TMF
3 Conservative (B4) 25% NTSX + 25% GDE + 25% RSST + 25% ZROZ
4 Sleeping pills (L1 CEGB) 40% NTSX + 25% GDE + 17.5% KMLM + 17.5% TLT
5 Popular 50/25/25 50% SSO + 25% GLD + 25% ZROZ
6 SPY 1× 100% SPY
7 SSO 2× buy-hold 100% SSO
8 UPRO 3× buy-hold 100% UPRO
9 Gayed LRS 2× / 3× SPY > 200d SMA → SSO/UPRO else IEF

Results (testfol.io, ~38y window, sorted by Sharpe)

portfolio CAGR Max DD Sharpe $10k → today
🟢 Conservative (B4 ZROZ) 13.96% -28.65% 0.798 $1.50M
🔵 Sleeping pills (L1 CEGB) 11.56% -22.27% 0.782 $662k
⚪ Bogleheads 67% NTSX 11.55% -22.48% 0.778 $660k
🔴 Aggressive (B2) 14.61% -36.21% 0.772 $1.86M
🟠 Balanced (T1) 14.19% -30.66% 0.744 $1.62M
Popular 50/25/25 SSO/GLD/ZROZ 13.47% -39.84% 0.637 $1.27M
Gayed LRS 2× (SSO 200d) 15.62% -43.49% 0.595 $2.60M
Gayed LRS 3× (UPRO 200d) 18.77% -57.59% 0.575 $7.30M
SPY 1× buy-hold 11.48% -55.14% 0.528 $643k
SSO 2× buy-hold 14.59% -88.67% 0.476 $5.95M
UPRO 3× buy-hold 14.92% -98.29% 0.475 $6.49M ‡

‡ SSO/UPRO buy-hold show high terminal but −88% / −98% drawdowns make this unholdable in practice — anyone who held UPRO 1×→0.017× through 2008 either had iron stomach or sold out. The Sharpe captures this brutally (0.47).

What this looks like

Equity curves ($10k start, log scale):

Equity 1987-2026

Drawdowns (the part that hurts):

Drawdown 1987-2026

CAGR vs Max DD (upper-right = beats SPY on both):

Pareto frontier

The question

I'm seeing 4 portfolios in the upper-right quadrant of the scatter (CAGR > SPY AND |MaxDD| < SPY). All static, all annual rebal, no signals.

Anyone hold something not on this list that lands in that quadrant?

I'm specifically curious about:

  • Different MF blends (CTA? simplify managed futures combos?)
  • Different duration handling (DBLTX? VGLT? TLTW with covered calls?)
  • Modified risk-parity weights (different from CEGB)
  • Capital-efficient ETFs I haven't considered (RSBT? RSSB? NTSI?)

Also genuinely curious if anyone has UPRO LRS variants with a smarter signal than 200d SMA that crosses the upper-right quadrant. I tested Gayed canonical and it lands in lower-right (better CAGR, worse MDD).

Replicate on testfol.io

testfol.io backtest results

reddit.com
u/noletovictor — 14 days ago