Internet Archive overwhelmed by AI data harvesting company and pleads for responsible use
The Internet Archive, a non-profit library of millions of free books, movies, software, music, websites, and more, experienced an overload of servers due to an AI company harvesting data from the archive at an “extreme rate”. Mark Graham, who runs the Internet Archive's Wayback Machine, made the announcement on Twitter a couple of days ago.
The website was getting tens of thousands of requests per second from someone using Amazon Web Services, which overloaded the servers and caused the website to go offline. However, a few hours later, the Internet Archive was back online. Brewster Kahle, the co-founder of the Internet Archive, wrote a short blog post about what happened.
The blog post also warned those wanting to use the Internet Archive's materials in bulk to start slowly and ramp up, or contact them directly to anticipate any issues. The post ended with a plea to “Please use the Internet Archive, but don't bring us down in the process”.
The Internet Archive is a valuable resource for researchers, students, and anyone interested in accessing free information. It is a non-profit organization that relies on donations to keep the servers running and the website accessible.
The incident highlights the importance of responsible use of online resources and the need for companies to consider the impact of their actions on others. The Internet Archive is a valuable resource, and it is important to ensure that it remains accessible to all who need it.
This is why we can't have nice things.
Selfish people causing services to implement rate limiting, mandatory registrations, and quality downgrades for everyone, just to keep the lights on.