NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

Read original: arXiv:2407.06579 - Published 7/10/2024 by Hongfei Huang, Tingting Liang, Xixi Sun, Zikang Jin, Yuyu Yin

NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

Overview

This paper introduces a new benchmark dataset called NoisyAG-News for evaluating machine learning models on text classification tasks with instance-dependent noise.
Instance-dependent noise refers to cases where the noise level can vary across different samples in the dataset, making it a more realistic and challenging scenario compared to previous noise benchmarks.
The NoisyAG-News dataset is derived from the popular AG News corpus and features controlled levels of noise, allowing researchers to assess the robustness of their models to varying degrees of instance-dependent label noise.

Plain English Explanation

The paper presents a new dataset called NoisyAG-News that can be used to test how well machine learning models perform on text classification tasks when the training data contains "noisy" or incorrect labels. This type of noise, where the level of noise can vary across different samples in the dataset, is more realistic but also more challenging than the noise patterns seen in previous benchmark datasets.

The researchers created NoisyAG-News by taking an existing dataset called AG News, which contains news articles labeled with their topic categories, and deliberately introducing varying levels of noise or incorrect labels. This allows researchers to evaluate how well their text classification models can handle this type of instance-dependent noise, where some samples may have clean labels while others are noisier.

By providing a standardized benchmark dataset with controlled noise levels, the authors hope to spur progress in developing more robust and adaptable machine learning models that can perform well even when the training data contains inconsistent or unreliable labels. This is an important challenge, as real-world datasets often suffer from some degree of label noise that can degrade model performance.

Technical Explanation

The paper introduces the NoisyAG-News dataset, which is derived from the popular AG News corpus of news article text and labels. To create NoisyAG-News, the authors systematically injected varying levels of instance-dependent noise into the labels, where the noise level can differ across individual samples.

This contrasts with previous noise benchmarks, such as NoiseBench, which featured uniform noise levels across the entire dataset. By introducing instance-dependent noise, the NoisyAG-News dataset aims to better reflect the real-world challenges faced by text classification models, where the reliability of labels can be inconsistent.

The authors evaluated several state-of-the-art text classification models on the NoisyAG-News benchmark and found that their performance degraded significantly as the level of instance-dependent noise increased. This highlights the need for more robust and adaptive learning approaches that can maintain accuracy in the face of variable label quality.

Critical Analysis

The NoisyAG-News benchmark represents an important step forward in evaluating how machine learning models handle noisy or unreliable labels, a pervasive issue in many real-world applications. By introducing controlled instance-dependent noise, the dataset poses a more realistic and challenging test compared to prior benchmarks with uniform noise levels.

That said, the authors acknowledge that NoisyAG-News still represents a simplified version of the noise patterns that can arise in practical settings. In real-world datasets, the noise may be more complex, with interdependencies between the features, labels, and noise distribution that are not fully captured by the current benchmark.

Additionally, the paper focuses solely on text classification tasks, leaving open the question of how instance-dependent noise might impact other domains, such as image segmentation. Further research is needed to understand the broader implications of this type of noise and develop generalizable solutions.

Overall, the NoisyAG-News dataset represents a valuable contribution to the field, providing a standardized platform for researchers to test their models' robustness to instance-dependent label noise. Continued advancement in this area could lead to more reliable and trustworthy machine learning systems that can operate effectively in the face of real-world data challenges.

Conclusion

The NoisyAG-News benchmark introduced in this paper represents an important step forward in evaluating machine learning models for text classification tasks in the presence of instance-dependent label noise. By systematically injecting varying levels of noise into the popular AG News dataset, the authors have created a new resource that more closely mimics the challenges faced by real-world applications.

The results showing significant performance degradation for state-of-the-art models as noise levels increase highlight the need for continued progress in developing robust and adaptable learning approaches capable of maintaining accuracy even when training data quality is inconsistent. Further research expanding the NoisyAG-News benchmark to other domains and noise patterns could yield valuable insights to advance the field of machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

Hongfei Huang, Tingting Liang, Xixi Sun, Zikang Jin, Yuyu Yin

Existing research on learning with noisy labels predominantly focuses on synthetic label noise. Although synthetic noise possesses well-defined structural properties, it often fails to accurately replicate real-world noise patterns. In recent years, there has been a concerted effort to construct generalizable and controllable instance-dependent noise datasets for image classification, significantly advancing the development of noise-robust learning in this area. However, studies on noisy label learning for text classification remain scarce. To better understand label noise in real-world text classification settings, we constructed the benchmark dataset NoisyAG-News through manual annotation. Initially, we analyzed the annotated data to gather observations about real-world noise. We qualitatively and quantitatively demonstrated that real-world noisy labels adhere to instance-dependent patterns. Subsequently, we conducted comprehensive learning experiments on NoisyAG-News and its corresponding synthetic noise datasets using pre-trained language models and noise-handling techniques. Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise, with samples of varying confusion levels showing inconsistent performance during training and testing. These real-world noise patterns pose new, significant challenges, prompting a reevaluation of noisy label handling methods. We hope that NoisyAG-News will facilitate the development and evaluation of future solutions for learning with noisy labels.

7/10/2024

AlleNoise -- large-scale text classification benchmark dataset with real-world label noise

Alicja Rk{a}czkowska, Aleksandra Osowska-Kurczab, Jacek Szczerbi'nski, Kalina Jasinska-Kobus, Klaudia Nazarko

Label noise remains a challenge for training robust classification models. Most methods for mitigating label noise have been benchmarked using primarily datasets with synthetic noise. While the need for datasets with realistic noise distribution has partially been addressed by web-scraped benchmarks such as WebVision and Clothing1M, those benchmarks are restricted to the computer vision domain. With the growing importance of Transformer-based models, it is crucial to establish text classification benchmarks for learning with noisy labels. In this paper, we present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise, containing over 500,000 examples across approximately 5,600 classes, complemented with a meaningful, hierarchical taxonomy of categories. The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes. In addition to the noisy labels, we provide human-verified clean labels, which help to get a deeper insight into the noise distribution, unlike web-scraped datasets typically used in the field. We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise. In addition, we show evidence that these algorithms do not alleviate excessive memorization. As such, with AlleNoise, we set the bar high for the development of label noise methods that can handle real-world label noise in text classification tasks. The code and dataset are available for download at https://github.com/allegro/AlleNoise.

7/17/2024

Noisy Label Processing for Classification: A Survey

Mengting Li, Chuang Zhu

In recent years, deep neural networks (DNNs) have gained remarkable achievement in computer vision tasks, and the success of DNNs often depends greatly on the richness of data. However, the acquisition process of data and high-quality ground truth requires a lot of manpower and money. In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images, i.e., noisy labels. The emergence of noisy labels is inevitable. Moreover, since research shows that DNNs can easily fit noisy labels, the existence of noisy labels will cause significant damage to the model training process. Therefore, it is crucial to combat noisy labels for computer vision tasks, especially for classification tasks. In this survey, we first comprehensively review the evolution of different deep learning approaches for noisy label combating in the image classification task. In addition, we also review different noise patterns that have been proposed to design robust algorithms. Furthermore, we explore the inner pattern of real-world label noise and propose an algorithm to generate a synthetic label noise pattern guided by real-world data. We test the algorithm on the well-known real-world dataset CIFAR-10N to form a new real-world data-guided synthetic benchmark and evaluate some typical noise-robust methods on the benchmark.

4/8/2024

👁️

NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition

Elena Merdjanovska, Ansar Aynetdinov, Alan Akbik

Available training data for named entity recognition (NER) often contains a significant percentage of incorrect labels for entity types and entity boundaries. Such label noise poses challenges for supervised learning and may significantly deteriorate model quality. To address this, prior work proposed various noise-robust learning approaches capable of learning from data with partially incorrect labels. These approaches are typically evaluated using simulated noise where the labels in a clean dataset are automatically corrupted. However, as we show in this paper, this leads to unrealistic noise that is far easier to handle than real noise caused by human error or semi-automatic annotation. To enable the study of the impact of various types of real noise, we introduce NoiseBench, an NER benchmark consisting of clean training data corrupted with 6 types of real noise, including expert errors, crowdsourcing errors, automatic annotation errors and LLM errors. We present an analysis that shows that real noise is significantly more challenging than simulated noise, and show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound. We release NoiseBench to the research community.

5/14/2024