fbpx
30 7 月, 24

Removing undesirable data can harm AI model performance

bernieBlog

Unlearning techniques, aimed at erasing specific undesirable information from generative AI models, such as sensitive data or copyrighted material, can have a detrimental effect on their capabilities. A recent study, co-authored by researchers from the University of Washington, Princeton, the University of Chicago, USC, and Google, discovered that popular unlearning techniques currently in use often degrade models to the extent that they become unusable. The evaluation revealed that the existing feasible methods for unlearning are not yet suitable for practical implementation or deployment in real-world scenarios. Weijia Shi, a researcher involved in the study, emphasized the lack of efficient methods that allow models to forget specific data without significant loss of utility.

How models learn

Generative AI models lack true intelligence and operate as statistical systems that make predictions based on patterns in data. They are trained on vast amounts of examples, such as movies, voice recordings, essays, and more, to learn the likelihood of certain data occurrences within the context of surrounding information.

For instance, if given an email ending with the phrase “Looking forward…”, a model trained to autocomplete messages might suggest “…to hearing back,” based on patterns it has learned from the input data. However, it is important to note that there is no intention or consciousness behind the model’s suggestions—it is simply making an educated guess.

Many models, including prominent ones like GPT-4o, are trained on data collected from public websites and datasets available on the internet. Vendors developing these models often argue that their practice of scraping and using data for training falls under fair use, and they may not inform, compensate, or credit the owners of the data.

The copyright concerns raised by some copyright holders have led to lawsuits against vendors, including authors, publishers, and record labels, aiming to enforce changes in their practices.

The copyright dilemma has contributed to the increased attention on unlearning techniques. In response to this, Google and several academic institutions launched a competition last year to encourage the development of new unlearning approaches.

Unlearning techniques could also offer a solution for removing sensitive information, such as medical records or compromising photos, from existing models in response to requests or government orders. Models, due to their training process, tend to accumulate private information, ranging from phone numbers to more problematic examples. While some vendors have introduced opt-out tools that allow data owners to request the removal of their data from future models, these tools do not apply to models trained before their implementation. Unlearning would provide a more comprehensive approach to data deletion.

However, unlearning is not as simple as pressing the “Delete” button. It poses technical challenges that need to be addressed for effective implementation.

The art of forgetting

Unlearning techniques rely on algorithms that guide models away from specific data, aiming to minimize the model’s output of that data. To evaluate the effectiveness of these algorithms, researchers developed a benchmark called MUSE (Machine Unlearning Six-way Evaluation). The benchmark assesses an algorithm’s ability to prevent models from regurgitating training data and eliminate the model’s knowledge of that data.

The study focused on making models forget information from the Harry Potter series and news articles. The researchers tested whether an unlearned model could refrain from producing verbatim sentences from the books, answer questions about scenes, and indicate any training on the text. They also examined the model’s retention of related general knowledge, such as the authorship of the Harry Potter series, which they referred to as the model’s overall utility.

While the tested unlearning algorithms did make models forget certain information, they also negatively impacted the models’ general question-answering capabilities. This trade-off highlights the challenge of designing effective unlearning methods due to the intricate entanglement of knowledge within the model.

Currently, there are no definitive solutions to address this issue, emphasizing the need for further research. Vendors relying on unlearning as a solution for training data concerns may need to explore alternative approaches until a technical breakthrough makes unlearning more feasible.