u/LordWeirdDude

Got Burned By Using Z-Scores For Dirty API Data, But I Think I Figured Out Something

Processing img 4eckd4gsrvvg1...

I’m on my third complete rewrite of a deterministic value investing engine. The hardest lesson from the first two tries was that standard deviation is practically useless for filtering raw API data.

I used to run a rolling Z-score to drop exchange glitches and fat fingers. But I found out the HARD way that a single garbage tick at $5.00 on a $150 stock distorts the mean so violently that subsequent wicks hide perfectly inside expanded standard deviation. So my execution engine failed.

Instead, I moved my Layer 1 ingestion to Median Absolute Deviation (MAD) (flowchart attached). The median provides an immovable floor that completely ignores the chaos of a massive glitch.

(Note: I had to hardcode a minimum variance "Penny Stock Shield" to prevent division-by-zero errors on frozen assets. Still looking for a more elegant math fix for that).

Before I permanently bake this into my TimescaleDB pipeline, I need to know my blind spots. Are there specific volatility regimes or edge cases where a MAD filter completely breaks down?

Roast my logic before I risk capital on it. Let me know if you want to see the code.

reddit.com
u/LordWeirdDude — 1 day ago