Noise Correction on Subjective Datasets

2311.00619

Published 6/5/2024 by Uthman Jinadu, Yi Ding

🌀

Abstract

Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.

Create account to get full access

Overview

Incorporating diverse perspectives from annotators is crucial for unbiased data modeling, but annotator fatigue and changing opinions can distort dataset annotations.
The researchers propose using multitask learning and loss-based label correction to learn a more accurate representation of diverse opinions, which can separate agreeing and disagreeing annotations and provide a way to control the level of disagreement.
This method can improve prediction performance in single or multi-annotator settings and remains robust to additional label noise in subjective data.

Plain English Explanation

When building machine learning models, it's important to capture a wide range of perspectives from the people who label the training data. However, over time, the people doing the labeling can become tired or change their minds, which can introduce errors into the dataset.

To address this, the researchers developed a new technique that uses multitask learning and loss-based label correction to better capture the diverse opinions of the annotators. This allows the model to clearly distinguish between annotations that the annotators agree on and those they disagree on. Additionally, the researchers can control how much disagreement the model encourages or discourages.

By using this approach, the model's prediction performance improves, even when the data has some additional "noise" or errors. This is especially important for subjective tasks, where people's opinions can vary.

Overall, this research helps address the challenges of capturing diverse human perspectives in machine learning datasets, leading to more accurate and unbiased models.

Technical Explanation

The key innovation of this paper is the use of multitask learning and loss-based label correction to learn a more accurate representation of diverse annotator opinions. The researchers formulate the problem as a multitask learning setup, where the model must simultaneously predict the original label and a corrected label that represents the consensus across annotators.

The loss function encourages the model to learn representations that separate agreeing and disagreeing annotations. By adjusting the relative weighting of these two tasks, the researchers can control the level of disagreement the model learns to capture. This allows the model to adapt to the needs of the particular application, encouraging more or less consensus as desired.

The researchers demonstrate the effectiveness of this approach on both single and multi-annotator settings. They show that the method improves prediction performance compared to baselines, and importantly, remains robust to additional label noise that may be present in subjective data.

Critical Analysis

The researchers acknowledge several key limitations of their work. First, the method relies on having access to multiple annotators for each data point, which may not always be feasible. Additionally, the researchers only evaluate on a limited set of datasets, and further testing on a wider range of tasks and domains would be valuable.

Another potential concern is the interpretability of the learned representations. While the method can separate agreeing and disagreeing annotations, it's unclear how the model is using this information internally to make predictions. More analysis of the model's decision-making process could provide additional insights.

Furthermore, the researchers do not explore the long-term impacts of their approach on dataset curation and model development. It's possible that overly encouraging disagreement could lead to instability or make it difficult to converge on a reliable ground truth. Careful consideration of these broader implications would be an important area for future research.

Despite these limitations, this work represents a promising step towards building machine learning systems that can more effectively leverage diverse human perspectives. By addressing the challenges of annotator fatigue and opinion drift, the researchers have developed a technique that could lead to more robust and unbiased models across a variety of applications.

Conclusion

This research proposes a novel approach to incorporating diverse annotator perspectives into machine learning models. By using multitask learning and loss-based label correction, the method can separate agreeing and disagreeing annotations and provide a way to control the level of disagreement learned by the model.

The researchers demonstrate that this technique can improve prediction performance in both single and multi-annotator settings, and importantly, remains robust to additional label noise in subjective data. While there are some limitations to the current work, this research represents an important step forward in addressing the challenges of capturing diverse human perspectives in machine learning.

As the field of AI continues to advance, techniques like this will be crucial for building models that are truly representative of the diverse experiences and opinions of the people they aim to serve.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🏅

Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks

Negar Mokhberian, Myrl G. Marmarelis, Frederic R. Hopp, Valerio Basile, Fred Morstatter, Kristina Lerman

Supervised classification heavily depends on datasets annotated by humans. However, in subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters. Annotations have commonly been aggregated by employing methods like majority voting to determine a single ground truth label. In subjective tasks, aggregating labels will result in biased labeling and, consequently, biased models that can overlook minority opinions. Previous studies have shed light on the pitfalls of label aggregation and have introduced a handful of practical approaches to tackle this issue. Recently proposed multi-annotator models, which predict labels individually per annotator, are vulnerable to under-determination for annotators with few samples. This problem is exacerbated in crowdsourced datasets. In this work, we propose textbf{Annotator Aware Representations for Texts (AART)} for subjective classification tasks. Our approach involves learning representations of annotators, allowing for exploration of annotation behaviors. We show the improvement of our method on metrics that assess the performance on capturing individual annotators' perspectives. Additionally, we demonstrate fairness metrics to evaluate our model's equability of performance for marginalized annotators compared to others.

5/17/2024

cs.CL

Noisy Label Processing for Classification: A Survey

Mengting Li, Chuang Zhu

In recent years, deep neural networks (DNNs) have gained remarkable achievement in computer vision tasks, and the success of DNNs often depends greatly on the richness of data. However, the acquisition process of data and high-quality ground truth requires a lot of manpower and money. In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images, i.e., noisy labels. The emergence of noisy labels is inevitable. Moreover, since research shows that DNNs can easily fit noisy labels, the existence of noisy labels will cause significant damage to the model training process. Therefore, it is crucial to combat noisy labels for computer vision tasks, especially for classification tasks. In this survey, we first comprehensively review the evolution of different deep learning approaches for noisy label combating in the image classification task. In addition, we also review different noise patterns that have been proposed to design robust algorithms. Furthermore, we explore the inner pattern of real-world label noise and propose an algorithm to generate a synthetic label noise pattern guided by real-world data. We test the algorithm on the well-known real-world dataset CIFAR-10N to form a new real-world data-guided synthetic benchmark and evaluate some typical noise-robust methods on the benchmark.

4/8/2024

cs.CV cs.AI

🌀

The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels

Eve Fleisig, Su Lin Blodgett, Dan Klein, Zeerak Talat

Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement--some challenged by perspectivist approaches, and some that remain to be addressed--as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.

5/10/2024

cs.LG cs.CL cs.CY

🤖

Active Label Correction for Building LLM-based Modular AI Systems

Karan Taneja, Ashok Goel

Large Language Models (LLMs) have been used to build modular AI systems such as HuggingGPT, Microsoft Bing Chat, and more. To improve such systems after deployment using the data collected from human interactions, each module can be replaced by a fine-tuned model but the annotations received from LLMs are low quality. We propose that active label correction can be used to improve the data quality by only examining a fraction of the dataset. In this paper, we analyze the noise in datasets annotated by ChatGPT and study denoising it with human feedback. Our results show that active label correction can lead to oracle performance with feedback on fewer examples than the number of noisy examples in the dataset across three different NLP tasks.

5/21/2024

cs.LG cs.AI