
Just published an Avito.ma (Morocco) Scraper: Bypassing brittle CSS by tapping directly into Next.js JSON payloads
Hey, fellow scrapers! 👋
I’ve been working on a data pipeline for the MENA region and just published our first major Actor: Avito Maroc Scraper. Avito is the absolute giant of classifieds in Morocco (cars, real estate, electronics, jobs), but their UI structure updates frequently, making traditional CSS selectors a nightmare to maintain.
The Technical Approach: Instead of fighting the DOM, I built this scraper to strictly intercept the underlying NEXT_DATA JSON payloads.
- Zero CSS Reliance: If the data is on the page, the actor gets it directly from the backend state. It’s incredibly stable.
- Dynamic Attribute Parsing: Avito has vastly different attributes per category (e.g., Mileage and Transmission for cars vs. Rooms and Square Meters for apartments). The actor dynamically maps these into clean, structured JSON objects.
- HD Images: It bypasses the compressed UI thumbnails and extracts the full high-res image URLs.
AI-Ready Output: I specifically designed the output to be ingested into LLM context windows and RAG pipelines. It spits out pristine, standardized JSON that you can immediately pipe into your vector databases or autonomous agents.
Quick Note on Proxies: Avito has some pretty aggressive anti-bot protection. While datacenter proxies might work for tiny runs, you really need Apify Residential Proxies if you want to scale this for thousands of items.
👇 How to get started:
Don't want to deal with code or infrastructure? You can run it directly from the cloud and download the data in Excel/CSV/JSON. Just paste an Avito link and click start: 👉 https://apify.com/scraper_guru/avito-maroc-scraper
I’m on a mission to build out the "Data Mine" for the MENA region. I'd love your feedback!