u/IllPut1820 — reddlx

metric

A: trailing OFF

B: trailing ON

ROI

+63.05%

+5.19%

-57.86 pp

Sharpe

4.59

0.97

-3.62

2.73

1.29

-1.45

58.7%

82.8%

+24.1 pp

n_trades

+12

DD trail

6.61%

9.98%

+3.38 pp

mean_win

+3.25%

+0.63%

-2.61 pp

mean_loss

-1.80%

-2.04%

-0.23 pp

capital_final

$8152

$5259

-$2893

My MBot ML crypto bot backtested at +63% ROI / Sharpe 4.6 / 90 days OOS. Live paper after 10 days: -23%, PF 0.14. Spent the day auditing — culprit was a default trailing_enabled=True flag that fires Phase-1 breakeven on every winning trade, capping them at ~+0.05% before they reach full TP. Re-ran the sim with identical data/predictions, only toggling trailing on vs off : 63% (off) → 5.19% (on). 12x ROI reduction. ZERO full TP hits with trailing on. Posting because this failure mode is sneaky and I suspect it's a common silent killer.

⚙️ Setup

5-model ensemble — LightGBM + CatBoost + ModernTCN + FT-Transformer + TabM. Trained on 18 months Binance data. Vol-mode regression (target = forward realized vol, not direction; direction comes from an EMA50 filter). 5 pairs (BTC/ETH/SOL/BNB/XRP), 15-min bars, 144-bar (36h) horizon. Walk-forward CPCV with 12 paths. Sweep-tuned config landed on vol-Z gate 1.30, daily vol target 0.7%, TP/SL ratio 2.22, max 1 position, leverage 5x.

📊 OOS sweep — 90-day held-out

ROI +63.05%, Sharpe 4.59, PF 2.73, WR 58.7%, n_trades 46, DD trail 6.61%, DD daily 4.34%. Looked clean. Deployed to paper via a wrapper script that fills a config.json and injects sweep-specific env vars before launching the live trading engine.

🔥 Live paper, 10 days in

ROI -23%, PF 0.14, WR around 80% (higher than backtest), mean_win around +0.6% (way smaller than expected +3%). That high-WR + low-PF + tiny-mean-win combo is the textbook signature of a breakeven cap. Lots of small wins from positions exiting at near-flat, occasional full SL hits that wipe out a chunk.

🧪 A/B methodology

Re-implemented the original sweep simulator. Same predictions, same entries, same sizing, same SL and TP at open. One flag toggled — 3-phase ATR trailing in the live engine. Phase 1 BE moves the SL to breakeven-plus-buffer when profit reaches +1×ATR. Phase 2 trails behind the peak at +2×ATR. Phase 3 tightens further at +4×ATR. Identical seed, identical bars, identical model output. Pure isolation of the trailing effect.

📉 A/B result

metric	A: trailing OFF	B: trailing ON	Δ

ROI	+63.05%	+5.19%	-57.86 pp
Sharpe	4.59	0.97	-3.62
PF	2.73	1.29	-1.45
WR	58.7%	82.8%	+24.1 pp
n_trades	46	58	+12
DD trail	6.61%	9.98%	+3.38 pp
mean_win	+3.25%	+0.63%	-2.61 pp
mean_loss	-1.80%	-2.04%	-0.23 pp
capital_final	$8152	$5259	-$2893

Exit-reason breakdown. A (trailing OFF) — TP=12, SL=11, TIME=23. B (trailing ON) — TP=0, SL=8, TIME=1, BE_P1=17, TRAIL_P2=13, TIGHT_P3=19. Zero full TP hits with trailing on. Every winner that would have gone the distance got capped earlier in one of the three trailing phases.

🎯 Why so brutal

ATR_14 on a 15-minute BTC chart is typically 0.3 to 0.5% of price. Phase 1 BE triggers at +1×ATR favorable, so around +0.3 to +0.5% in profit. The original take-profit lives at barrier_width × tp_sl_ratio, which lands between +1% and +3% favorable. Phase 1 fires WAY before TP on basically every trade that gets any sustained move. The new SL locks at entry plus 0.05 to 0.15%, so any retracement at all kicks the position out near flat. Math is unforgiving.

🔧 The fix

In the wrapper config override : cfg['trailing_enabled'] = False. That's it. The sweep simulator defaulted trailing to OFF. The live engine defaulted it to ON. My wrapper had about twenty explicit overrides for sizing, RL, conformal, partial TP, dynamic leverage and so on, but missed this one flag. Silent live divergence from backtest.

⚠️ Bonus trap I found while patching. There was a second breakeven block elsewhere in the engine, at a different code path, running independently of the trailing_enabled flag. Same effect — moves SL to entry ± 0.05% on a separate threshold trigger. Had to gate that one explicitly too with if self.trailing_enabled and not getattr(position, 'breakeven_set', False):. Live engines accumulate "safety nets" over time and they all need to be audited together.

💡 Lessons

Backtest config never equals live config by default. Audit every field that could diverge, not just the ones you remember. I had twenty-plus explicit overrides and still missed the critical one.

The PF + WR signature is diagnostic gold. High win rate plus tiny mean win plus low PF equals breakeven cap, almost always. If you see this pattern in live deploy, audit every SL-mover code path immediately.

A/B with same predictions is the cleanest causal proof — both sims use identical signals, only the exit logic differs. Run-time was about 10 minutes. No need for fresh data, no need to retrain, no need for fancy stats.

Multiple BE mechanisms are common in mature engines. After finding the obvious one, search for ALL of them. The second one I found wasn't even called "trailing", it was a separate AUDIT-43 fix added at some point and forgotten.

PS : I post live signals on telegram (t. me/Mbot904) if u wanna check