OpenAI launches GPTBot: A new web crawler to enhance AI model performance
OpenAI has introduced their new ChatGPT web crawler called GPTBot, designed to enhance the performance of AI models such as GPT-4. By browsing the internet, GPTBot can potentially improve the accuracy and safety of these models.
GPTBot's operation is detailed in OpenAI's blog post, with a focus on its data filtering feature. This function is designed to exclude content behind paywalls, sources that gather Personally Identifiable Information (PII), and policy-violating text. In light of past issues related to data collection, copyright infringement, and privacy violations, OpenAI has implemented measures to allow websites to limit GPTBot's access to their content, either through IP address blocking or adjustments to the Robots.txt file. Additional opt-out features, such as chat history disabling, provide users with more control over their personal data accessed by the AI. However, there is currently no option to delete content from the dataset used to train models like ChatGPT 3.5 and 4.
Website owners who want to prevent GPTBot from accessing their content can modify their Robots.txt file to manage the web crawler's permissions. This enables them to dictate which sections of their site GPTBot is allowed or not allowed to access.