r/webscraping

Building a Quick Commerce Price Comparison Site - Need Guidance

​

I’m planning to build a price comparison platform, starting with quick commerce (Zepto, Instamart, etc.), and later expanding into ecommerce, pharmacy, and maybe even services like cabs.

I know there are already some well-known players doing similar things, but I still want to build this partly to learn, and partly to see if I can do it better (or at least differently).

What I’m thinking so far:

• Reverse engineer / analyze APIs of quick commerce platforms

• Build a search orchestration layer to query multiple sources

• Implement product search + matching across platforms

• Normalize results (since naming, units, packaging differ a lot)

• Eventually add location-aware availability + pricing

What I need help with:

• Is reverse engineering APIs the right approach, or is there a better/cleaner way?

• Any open-source projects / frameworks I can build on?

• Best practices for:

• Search orchestration

• Product normalization / deduplication

• Handling inconsistent catalogs

Would love to hear from anyone who has worked on aggregators, scraping systems, or similar platforms.

Even if you think this idea is flawed — I’m open to criticism

Thanks!

reddit.com
u/assalTora — 7 hours ago

How to Scrape 200k+ YouTube Channels Daily Without Blocking

If you attempt to orchestrate 200k daily headless browser sessions via Puppeteer or Playwright, your server costs will bankrupt you.

To be successful in this volume of channels, you have to prioritize lightweight network requests, utilize smart filtering, and implement advanced fingerprint spoofing. Here is the architecture necessary to pull it off.

scrapingenthusiast.github.io
u/catmewo — 9 hours ago

Irritated by coworker

Not sure if this is the right place to post this. Newish to scraping because I usually only scrape a particular site.

A coworker left after developing a code for scraping this site. The HTML backend was updated after and I had to purge and revamp the code from zero to get it to work. People think I am still using the old code I received from this person.

Another suckup coworker was told to get this script from me and run it on other versions of the same site. They are a code runner but know zero of debugging (think giving up and calling me for every small instance of an error). Now they end up getting all the managers’ requests to scrape the site (and hence the hours) while I get calls from this person to debug it on teams which I cannot charge to said project(s) even if I am essentially doing the job.

Am I wrong for being territorial about my script and wanting for the site to change its HTML backend again asap for me to get my chance to shine? Let me know.

reddit.com
u/Street-Tea-9674 — 21 hours ago

Issue bypassing a reCaptcha

Hello everyone, I am having an issue while trying to automate a data scrape on a site. I am using the Pydoll framework instead of Selenium to bypass Cloudflare, along with paid mobile/residential proxies and a mobile spoofing configuration, but I’ve had no luck so far. The problem seems to be related to a misconfiguration on the website owner’s backend. The process works when done manually, but it fails when executed as an agent.

https://preview.redd.it/2t43t23kifug1.png?width=1092&format=png&auto=webp&s=1fe4903de78e0e745e4d05ac800ae7d2daa870f8

Would appreciate any help or suggestion . Thank you

reddit.com
u/PhoeniX8089 — 16 hours ago
Week