u/FusionMaster-04

I’m working on a web scraping task where I need to collect structured data like company name, category, turnover, and basic details from EPC-related listings.

I’m facing a few technical challenges and would appreciate guidance:

  1. The website is React-based, so content loads dynamically. What is the best approach to reliably extract such data (Selenium, Playwright, or something else)?
  2. Some elements (like lists) have inconsistent HTML structure (e.g., <ul> tags sometimes with classes, sometimes without, sometimes multiple on the same page). How do you design a robust parser for this?
  3. There are “Load more” or dynamically loaded sections. What is the recommended way to handle these in automation scripts?
  4. How do you structure scraping workflows to minimize failures due to layout changes?

I am looking for a code-based, free solution (preferably Python).

Any guidance, best practices, or learning resources would help.

reddit.com
u/FusionMaster-04 — 11 days ago