u/DeLaCompostela

Open-sourced a library for filtering proxies
▲ 22 r/scrapingtheweb+2 crossposts

Open-sourced a library for filtering proxies

Most residential pool providers sell a lot of proxies that are technically reachable but already burnt, datacenter IPs sneaking in as "residential", TCP fingerprints screaming Linux when you're trying to look like Windows, IPs already on FingerprintJS Pro's watchlist. You only find out after Cloudflare/reCAPTCHA has already tanked your score.

I put together a Python lib that does 4 cheap checks per proxy in 2s total (configurable, run in parallel across a pool):

ipapi: geo + ASN-level reputation (bogon, datacenter, Tor, VPN, known abuser)

TCP stack fingerprint: TTL + TCP options, catches Linux-stack proxies claiming to be Windows

pixelscan: second-opinion IP reputation

FingerprintPro pre-probe: checks whether the IP is already flagged or overused in the last 24h

Each check can be disabled independently. Pools/retries/concurrency are your job the lib is intentionally one-shot and stateless so it composes with whatever orchestrator you already have.

Repo: https://github.com/P0st3rw-max/proxyquality

MIT license.

u/DeLaCompostela — 1 day ago