Reddit to block Wayback Machine from indexing its content over AI data scraping concerns
Reddit will restrict most of the Internet Archive’s Wayback Machine from indexing its content, citing concerns that AI companies are scraping data from archived pages to bypass the platform's controls. Under the new policy, the Wayback Machine loses access to Reddit post detail pages, user profiles, and comments. Only the Reddit.com homepage will remain available for daily archival.
As a result, the Internet Archive can now capture only basic daily snapshots of trending headlines, without preserving full post content or discussion threads. According to Reddit, some AI companies have used archived pages to scrape Reddit data in violation of the company’s policies. These restrictions will remain until the Internet Archive can better prevent scraping, comply with Reddit's privacy rules, and reliably delete removed content.
Reddit informed the Internet Archive in advance and said the limits would begin ramping up immediately. The move aligns with Reddit’s ongoing efforts to curb bulk data extraction, including 2023 API restrictions and paid data deals with AI and search firms. In 2024 and 2025, Reddit signed agreements with Google and OpenAI, blocked major search engines, and sued Anthropic for alleged continued scraping.
Comments
honestly i think using a frontend to reddit is a workaround, have to check later
Oh no, we won't be preserving all those genuinely retarded opinion posts!
Funny how this helps reddit AI scrapping and their compulsion of say bullshit
Only because they are using their own weak AI and want to kill competition. As everyone knows, the Reddit team is all about showing they are socially progressive while being strongly capitalist.
All so they can just scrape data themselves. Fake AI is killing the internet.
and their bribers
Uuh, well start using your own archiving solution. But that's a bummer...