How does Perplexity actually get webpage data? Through Google.
After digging through the Reddit vs Perplexity lawsuit, I found a pretty interesting example.
To prove that Perplexity was indirectly getting Reddit data through Google, Reddit set up what they called a “honeypot test,” basically the digital version of marked bills.
The idea was simple: Reddit created test posts that could only be accessed by Google’s search crawler. A few hours after those pages were indexed by Google, the content from those test posts started appearing inside Perplexity answer queries.
So Perplexity apparently was not building its own search engine for this. Instead, it was buying services from third-party providers and indirectly getting Reddit data that had already been crawled by Google.
My guess is that a lot of other AI companies work similarly too. They mostly rely on Google’s data layer to build answers.
The original filing mentions this on page 27.