Six weeks into building a public crypto signal scanner, I audited the data before publishing results, and found two integrity bugs hiding in the pipeline. Here’s the system architecture, what broke, and what I learned from catching it before going public.
I’ve been building an automated crypto signal system for the past six weeks with one goal in mind: total transparency. Every signal is logged, hashed, tracked against live market data, and resolved automatically. The original plan was to come here and post six months of performance data.
Instead, I’m posting the bugs first.
While preparing the stats, I audited my own pipeline more deeply than I ever had before and found two issues serious enough that I no longer trust the earlier numbers enough to present them confidently. I figured sharing the mistakes and the architecture would be more valuable than pretending the data was cleaner than it really is.
Here’s what the system actually does.
Every five minutes it pulls Binance’s /api/v3/ticker/24hr, filters for USDT spot pairs with more than $500k in quote volume, then scores roughly 300 to 400 symbols using a simple momentum and relative-volume model based on 24-hour ticker data.
The scoring formula is roughly:
- 24h percentage move weighted heavily
- Relative volume compared to the rest of the market
- Bonus points for directional momentum
- Bonus points for high trade activity
Anything scoring 50 or higher triggers a signal, with a one-hour cooldown per symbol to avoid repeated alerts on the same move.
To be clear, this is not a true breakout engine. There’s no candle-close confirmation, no N-bar high detection, no proper structural breakout logic yet. It’s basically a momentum + relative-volume ranking system with a threshold trigger. Calling it a “breakout scanner” right now would be overselling it. A real breakout module using kline data is still on the roadmap.
Risk management is derived from the 24-hour trading range. Stop losses and targets are calculated using a pseudo-ATR approach, with TP1 and TP2 set at fixed R multiples. The entire pipeline runs through BullMQ workers: scanner → ingest → resolver → public archive.
Then came the first bug.
When I started aggregating results, the win rate for the early dataset looked absurdly bad. Between April 28 and May 2, the system recorded 490 signals, 264 stop losses, and zero take profits.
That’s not variance. That’s broken.
The issue turned out to be tied to a recent ingest pipeline rewrite. The resolver was correctly detecting stop losses, but take-profit transitions were failing silently for several days. Signals that hit TP simply stayed marked as OPEN.
After May 3 things partially improved, but another issue remained: dozens of TP1 hits per day were being stored with valid result fields while resolved_at stayed NULL.
I only caught it because I started thinking like a skeptical reader and wrote sanity-check queries against my own database. That’s when I found 473 “ghost rows.” I tagged them as unverified, backfilled timestamps as best I could, and queued a resolver patch for this week.
The second issue is less of a bug and more of an architectural limitation.
The resolver checks price once every 60 seconds using last traded price. It does not inspect candle highs and lows between polls.
That creates three major blind spots:
- If price briefly touches TP or SL and reverses within the minute, the touch can be missed entirely.
- If both TP and SL occur inside the same window, the resolver loses the actual sequence of events.
- R multiples are recorded at exact target levels with no slippage or execution variance modeled.
So the dataset is directionally useful, but slightly optimistic compared to real execution conditions. The current system measures whether the market moved in the predicted direction, not necessarily what a trader would have captured after spreads, slippage, latency, and fills.
The interesting part is this:
I probably would not have discovered these issues if I had kept the project private.
Nightly checks looked fine. Surface metrics looked fine. The bugs only became obvious once I started preparing the data for public scrutiny and imagined how someone skeptical would try to tear it apart. That mindset exposed problems I’d been missing for weeks.
So now the roadmap is straightforward:
Patch resolver timestamp handling and realized R calculations
Add correlation caps so one BTC move doesn’t trigger 30 nearly identical altcoin alerts
Add stale-range protections
Move to a kline-based resolver to eliminate wick blindness
Publish weekly CSV snapshots with SHA-256 checksums for public verification
I’ll post updated results after another 30 days of clean data collection once the fixes are fully deployed. In the meantime, I’m happy to discuss any part of the architecture, the scoring logic, the tradeoffs, or why the first version relied on /24hr ticker data instead of full klines.