u/mynameisyahiabakour

Hey r/rag,

I used to work on a lot of RAG / agent workflows lately and kept running into the same issue:

getting clean website data into the context window is way harder than it should be.

Most sites either:

So I ended up building an API for this, used by a few hundred companies in production today.

You can:

One thing I focused on heavily was making the markdown actually usable for RAG instead of just dumping raw DOM content.

Curious what everyone else here is using for live web ingestion / crawling in production right now.

Would genuinely love feedback from people building agent/RAG systems.

PS: Read the subreddit rules, seems this is allowed at-least once since I've never posted here and usually just lurk :)