r/thewebscrapingclub

What happens when you make a browser that is identical to chrome but it's use is scraping
▲ 71 r/thewebscrapingclub+2 crossposts

What happens when you make a browser that is identical to chrome but it's use is scraping

I built a real C++ browser and gave you a TypeScript library to control it — here's why it changes scraping

Most tools like Puppeteer and Playwright bolt automation onto Chrome from the outside. They're always playing catch-up with anti-bot systems.

I took a different approach. I built the actual browser — Qt6 + Chromium engine, written in C++. Then I wrote a TypeScript library (Piggy) that controls it over a local socket. That's why Cloudflare bypasses are almost trivial and the code stays dead simple.

Two repos, one ecosystem:

🖥️ Nothing Browser (the C++ browser) https://github.com/BunElysiaReact/nothing-browser

📦 Piggy (the TS library) — https://github.com/ernest-tech-house-co-operation/nothing-browser

What you get out of the box:

🪪 Persistent TLS fingerprint identical to real Chrome — sites can't profile you

🧠 Human Mode — randomized delays, natural scrolling, no robotic timing

⚡ Socket-based IPC — millisecond latency between your script and the browser

🌐 Remote deployment — binary runs on a VPS, you scrape from local

💾 Session persistence — save/restore cookies and storage, stay logged in

🏊 Tab pooling — concurrent requests inside one browser instance

🚀 Built-in API server — one line turns your scraper into a REST endpoint with OpenAPI docs

🔄 Proxy rotation — built-in fetch, test, switch, rotate

The code looks like this:

Ts import piggy from "nothing-browser";

await piggy.launch(); await piggy.register("books", "https://books.toscrape.com"); await piggy.books.navigate();

const books = await piggy.books.evaluate(() => Array.from(document.querySelectorAll(".product_pod")).map(el => ({ title: el.querySelector("h3 a")?.getAttribute("title") ?? "", price: el.querySelector(".price_color")?.textContent?.trim() ?? "", })) );

console.log(books); await piggy.close();

That's a real browser. Not a wrapper around someone else's.

Bun-first but Node compatible. Headless and headful ship as separate binaries so you're not carrying GPU overhead when you don't need it.

📚 Docs: https://nothing-browser-docs.pages.dev

Would love issues, feedback, and ⭐ stars — built in Kenya 🇰🇪

u/PeaseErnest — 5 days ago

built a browser MCP because every other one stunk, especially for scraping work

i scrape a lot. fifty plus sources, anti-bot stacks, login walls, geo gates. spent months copy-pasting HTML and headers into Claude/Cursor because they couldn't see the page themselves. they'd guess from my secondhand summary and get it wrong. just bringing them up to speed on a new source took forever.

tried every browser MCP out there. all stunk for the same reason.

  • Anthropic's Chrome extension. sandbox, macOS only, screen has to be awake. only works inside Claude.
  • Playwright MCP. empty Chromium, not your Chrome. re-auth from scratch. local only.
  • Browserbase / Stagehand. decent, but cloud Chromium from a datacenter IP. for scraping that's suicide. you lose your fingerprint, your residential IP, the whole moat.
  • BrowserMCP (open source). real browser via extension, gets that right. local stdio only. one tab. half-built.

so i built Reins: https://reins.vulcanos.pro

the thing nobody else does: hosted, but drives your real Chrome. Browserbase is hosted but cloud. BrowserMCP is your browser but local. Reins is both. extension in your actual Chrome with your real cookies, fingerprint, residential IP. MCP server is hosted so it works from Claude Code, Cursor, Zed, web Claude, anywhere over OAuth.

what that gets you:

  • your own session does the work. anti-bot sees your real fingerprint, real IP, warm cookies, normal mouse. nothing looks like a bot because nothing is a bot.
  • gated sources stop being special. SSO, geo-locked, login walled. you log in once like a human, agent runs on top.
  • multi-profile, one account. split work across profiles for ip diversity or regional accounts, pick from your MCP client. nobody else does this.
  • dumps can live remote. HARs, full DOMs, network logs stored off your laptop, LLM pulls on demand from any client.
  • runs anywhere MCP runs. every other "real browser" tool is local stdio that dies when you close your terminal.

install: https://chromewebstore.google.com/detail/reins/ifnmhlnmioieckkknedkikfbpkhkfpdi

my brother also uses it. takes his school quizzes, hunts apartments, does his online shopping. totally different use case, works because its his browser, already logged into everything.

free tier covers normal use. only hit metered if you scrape at scale and want dumps off your local disk.

Dm me if you have any questions

reddit.com
u/NoTicket660 — 4 days ago
▲ 11 r/thewebscrapingclub+2 crossposts

If you have been looking for a no-browser alternative, feel free to give this a go!

Fast and lightweight.

Would love feedback or bug reports if you run it against anything weird.

u/jinef_john — 3 days ago
▲ 3 r/thewebscrapingclub+1 crossposts

How do you tell if failures are caused by bad proxies or bad automation?

I'm dealing with a recurring problem where automated jobs fail inconsistently when proxies are involved.

Sometimes the browser test passes locally but fails in CI. Sometimes the request works without a proxy but times out with one. Sometimes one proxy provider works fine for one domain but performs terribly on another.

for me right now the hard part is diagnosis. I dont want to waste hours debugging selectors, waits, or test code if the real issue is proxy quality.

For those using proxies with Playwright, Selenium, scraping tests, or geo-based QA checks, what's your process for proving whether the proxy is the problem?

Do you benchmark providers before adding them to your automation stack? What metrics are actually useful?

I'm thinking:

  • success rate
  • median and p95 response time
  • timeout frequency
  • CAPTCHA/block rate
  • repeatability over time
  • results per target site, not just generic speed

If there's a standard way to test this properly.

reddit.com
u/Beardybear93 — 1 day ago

Are mobile proxies best for sm scraping?

Been looking into mobile proxies for scraping social platforms and the price jump over residential is pretty significant. Wondering if it's actually necessary or if good residential proxies do the same job. Do platforms like Instagram or TikTok detect residential IPs differently than mobile? What are you using for this?

reddit.com
u/SorinxD — 1 day ago
▲ 13 r/thewebscrapingclub+1 crossposts

A stealth Playwright(Firefox) version that passes all anti-bot and CAPTCHA checks

Hey guys,
I’ve been working on browser automation that can actually survive modern anti-bot systems (especially for AI agents).
So I created a fork of Playwright for Firefox patched directly at the C++ level. It generates a different but internally consistent fingerprint per session:
• CreepJS → 0% fake
• reCAPTCHA v3 → Score 0.90
• hCaptcha → Pass
• Fingerprint Pro → bot=false, tampering=false

Repo: https://github.com/feder-cr/invisible_playwright

If you’re fighting heavy anti-bot protection or building resilient agents, I’d love to hear your thoughts or test results. Feedback, issues, and contributions are very welcome!

Thanks in advance 🚀

u/Elieroos — 17 hours ago