Cloudflare accuses Perplexity of evading AI crawler blocks on sites using stealth tactics

Cloudflare accuses Perplexity of evading AI crawler blocks on sites using stealth tactics

Cloudflare has alleged that AI startup Perplexity bypassed website blocks on AI crawlers by disguising its automated scraping behavior. The company claims Perplexity rotated user-agent strings and changed autonomous system networks to avoid detection on sites that explicitly blocked automated access via robots.txt files and similar methods. The activity reportedly spanned millions of daily requests across tens of thousands of domains.

To identify the origin, Cloudflare used machine learning and a variety of network signals to fingerprint Perplexity’s crawler. When blocked, Perplexity reportedly switched to a generic browser user-agent imitating Google Chrome on macOS, further obscuring its identity. Cloudflare began the investigation after customers complained about persistent scraping activity from Perplexity despite configured blocking rules.

In response, Cloudflare has delisted Perplexity from its verified bot list and implemented new techniques to block stealth crawling attempts. Perplexity’s spokesperson denied the claims, calling the report a publicity stunt and asserting that the named bot wasn’t theirs and no content was accessed. The company also claimed the screenshots offered as evidence showed no actual content being accessed.

These allegations add to prior claims in 2023 that Perplexity bypassed paywalls and ignored robots.txt. Meanwhile, Cloudflare has positioned itself against unauthorized AI scraping and even just a few weeks ago launched a new Pay Per Crawl marketplace for monetizing bot access, with CEO Matthew Prince warning of business model disruption for publishers.

by Mauricio B. Holguin

K0RRTBayAreaPat
K0RR found this interesting
Perplexity iconPerplexity
  131
  • ...

Perplexity is an AI chatbot designed to enhance search experiences by delivering precise answers via a conversational interface. It adapts to context and user preferences, ensuring relevant results. Rated 4.2, Perplexity stands out with its AI-powered capabilities and ad-free environment.

Comments

Augusto Goulart
3

I'd trust Cloudflare assessments 1e+6 times before any "startup" AI company.

superstickynotemealt
1

I have mixed feeling about this because Perplexity has managed to give me important information that I simply can't find via google searches and this could be why. [things like pricing information, and ToS/support information for websites that hide it and such.]

1 reply
Eden

I'm just trying to understand what your comment means.

  • Google search doesn't find what you want.
  • Perplexity finds "information for websites that hide it". You seem to be agreeing that Perplexity is getting information that they have been told NOT to get.

For example, i run a website that has changeable information: names; dates; times; venues; etc. It plainly says in English, and other languages, "don't copy the page, just link to it" so that there won't be incompatible versions of the information found on the 'net. By the time the AI "learns" my page, it may have already changed. It's that kind of "intelligence" that makes people say that AI is actually UNIntelligence.

I want a robots.txt type of method to tell AI what not to scrape. Oh, that already exists! Perplexity, and others, should be intelligent enough to follow what is requested by the website controllers.

Gu