Got Burned By Using Z-Scores For Dirty API Data, But I Think I Figured Out Something
Processing img 4eckd4gsrvvg1...
I’m on my third complete rewrite of a deterministic value investing engine. The hardest lesson from the first two tries was that standard deviation is practically useless for filtering raw API data.
I used to run a rolling Z-score to drop exchange glitches and fat fingers. But I found out the HARD way that a single garbage tick at $5.00 on a $150 stock distorts the mean so violently that subsequent wicks hide perfectly inside expanded standard deviation. So my execution engine failed.
Instead, I moved my Layer 1 ingestion to Median Absolute Deviation (MAD) (flowchart attached). The median provides an immovable floor that completely ignores the chaos of a massive glitch.
(Note: I had to hardcode a minimum variance "Penny Stock Shield" to prevent division-by-zero errors on frozen assets. Still looking for a more elegant math fix for that).
Before I permanently bake this into my TimescaleDB pipeline, I need to know my blind spots. Are there specific volatility regimes or edge cases where a MAD filter completely breaks down?
Roast my logic before I risk capital on it. Let me know if you want to see the code.