ChatGPT Has a Big Privacy Problem
When OpenAI released GPT-3 in July 2020, it offered a glimpse of the data used to train the large language model. Millions of pages scraped from the web, Reddit posts, books, and more are used to create the generative text system, according to a technical paper. Scooped up in this data is some of the personal information you share about yourself online. This data is now getting OpenAI into trouble.
On March 31, Italy’s data regulator issued a temporary emergency decision demanding OpenAI stop using the personal information of millions of Italians that’s included in its training data. According to the regulator, Garante per la Protezione dei Dati Personali, OpenAI doesn’t have the legal right to use people’s personal information in ChatGPT. In response, OpenAI has stopped people in Italy from accessing its chatbot while it provides responses to the officials, who are investigating further.
The action is the first taken against ChatGPT by a Western regulator and highlights privacy tensions around the creation of giant generative AI models, which are often trained on vast swathes of internet data. Just as artists and media companies have complained that generative AI developers have used their work without permission, the data regulator is now saying the same for people’s personal information.