I'm asking from the reporting side more than the pure engineering side. We keep running into the same thing where getting data once is easy enough, but getting it consistently is where everything starts falling apart. Sometimes it's JS-heavy pages, sometimes rate limits, sometimes the site layout changes and nobody notices until the numbers look weird in a dashboard.
I'm also curious where people draw the line between "annoying but manageable" and "not worth owning in-house anymore." Are the main problems still proxies / blocking / CAPTCHAs, or is maintenance and monitoring the bigger issue now? Would love practical answers, especially from people who've dealt with this beyond a weekend script.
u/Amitk2405 — 15 days ago