u/Total_Nectarine_3623

Curated list of AI-powered web scraping tools
▲ 21 r/webscraping+1 crossposts

Curated list of AI-powered web scraping tools

Was researching AI scraping tools for a project and noticed the existing awesome lists either cover traditional scraping (Scrapy, BeautifulSoup) or web agents broadly. Couldn't find one focused specifically on LLM-powered scraping, so I put one together.

Covers frameworks (Crawl4AI, Scrapling, ScrapeGraphAI, llm-scraper), hosted APIs (Firecrawl, Jina Reader, Diffbot), browser infrastructure for AI agents, MCP servers, and search APIs built for LLMs.

Open to more what am I missing?

github.com
▲ 48 r/webscraping+1 crossposts

I started Obscura because every existing headless browser was either too heavy, too slow, or detected as a bot. It's a Rust headless browser engine.

The repo just hit 10k stars and I am very happy about it. I decided to open a waitlist for the Cloud version, the hosted version with managed infrastructure and residential proxies, for people who want the engine without operating it themselves.

Some specs:

- 30MB binary (vs 200MB+ headless Chrome)

- ~85ms page loads (vs ~500ms Chrome)

- Built-in stealth that beats most fingerprint detectors

- Pure Rust, not a Chromium fork

Much more performance optimization is on the way.

Happy to answer questions about the cloud or the engine itself.

u/Total_Nectarine_3623 — 14 days ago