Cloudflare accuses Perplexity of evading AI crawler blocks on sites using stealth tactics
Cloudflare has alleged that AI startup Perplexity bypassed website blocks on AI crawlers by disguising its automated scraping behavior. The company claims Perplexity rotated user-agent strings and changed autonomous system networks to avoid detection on sites that explicitly blocked automated access via robots.txt files and similar methods. The activity reportedly spanned millions of daily requests across tens of thousands of domains.
To identify the origin, Cloudflare used machine learning and a variety of network signals to fingerprint Perplexity’s crawler. When blocked, Perplexity reportedly switched to a generic browser user-agent imitating Google Chrome on macOS, further obscuring its identity. Cloudflare began the investigation after customers complained about persistent scraping activity from Perplexity despite configured blocking rules.
In response, Cloudflare has delisted Perplexity from its verified bot list and implemented new techniques to block stealth crawling attempts. Perplexity’s spokesperson denied the claims, calling the report a publicity stunt and asserting that the named bot wasn’t theirs and no content was accessed. The company also claimed the screenshots offered as evidence showed no actual content being accessed.
These allegations add to prior claims in 2023 that Perplexity bypassed paywalls and ignored robots.txt. Meanwhile, Cloudflare has positioned itself against unauthorized AI scraping and even just a few weeks ago launched a new Pay Per Crawl marketplace for monetizing bot access, with CEO Matthew Prince warning of business model disruption for publishers.


Comments
I'd trust Cloudflare assessments 1e+6 times before any "startup" AI company.
I have mixed feeling about this because Perplexity has managed to give me important information that I simply can't find via google searches and this could be why. [things like pricing information, and ToS/support information for websites that hide it and such.]
I'm just trying to understand what your comment means.
For example, i run a website that has changeable information: names; dates; times; venues; etc. It plainly says in English, and other languages, "don't copy the page, just link to it" so that there won't be incompatible versions of the information found on the 'net. By the time the AI "learns" my page, it may have already changed. It's that kind of "intelligence" that makes people say that AI is actually UNIntelligence.
I want a robots.txt type of method to tell AI what not to scrape. Oh, that already exists! Perplexity, and others, should be intelligent enough to follow what is requested by the website controllers.