Robust Noisy Label Learning via Two-Stream Sample Distillation

2404.10499

Published 4/17/2024 by Sihan Bai, Sanping Zhou, Zheng Qin, Le Wang, Nanning Zheng

Robust Noisy Label Learning via Two-Stream Sample Distillation

Abstract

Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning. Existing work either conducts sample selection or label correction to deal with noisy labels during the model training process. In this paper, we design a simple yet effective sample selection framework, termed Two-Stream Sample Distillation (TSSD), for noisy label learning, which can extract more high-quality samples with clean labels to improve the robustness of network training. Firstly, a novel Parallel Sample Division (PSD) module is designed to generate a certain training set with sufficient reliable positive and negative samples by jointly considering the sample structure in feature space and the human prior in loss space. Secondly, a novel Meta Sample Purification (MSP) module is further designed to mine adequate semi-hard samples from the remaining uncertain training set by learning a strong meta classifier with extra golden data. As a result, more and more high-quality samples will be distilled from the noisy training set to train networks robustly in every iteration. Extensive experiments on four benchmark datasets, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and Clothing-1M, show that our method has achieved state-of-the-art results over its competitors.

Create account to get full access

Overview

This paper introduces a novel approach called "Two-Stream Sample Distillation" for handling noisy labels in image classification tasks.
The key idea is to leverage two parallel neural networks - one focused on learning robust features from clean samples, and the other on distilling knowledge from noisy samples.
The authors demonstrate that this two-stream architecture can significantly improve classification performance in the presence of label noise, outperforming existing state-of-the-art methods.

Plain English Explanation

Image classification is a common task in machine learning where computers try to identify what's in a given image. However, the labels (the correct answers) used to train these models are not always perfect - they can be "noisy" or incorrect. This makes it challenging for the model to learn effectively.

The researchers in this paper developed a new technique called "Two-Stream Sample Distillation" to address this problem. The core idea is to use two separate neural networks working together. One network focuses on learning robust, high-quality features from the clean (correct) training samples. The other network then takes what it has learned and "distills" that knowledge into the noisy (incorrect) training samples.

By having these two networks work in parallel, the researchers found that the model could learn much more effectively in the presence of noisy labels, outperforming other state-of-the-art methods. This is an important advance, as noisy labels are a common issue in real-world machine learning applications, and being able to handle them robustly can unlock new capabilities.

Technical Explanation

The paper introduces a "Two-Stream Sample Distillation" (TSSD) framework for robust noisy label learning. The key innovation is the use of two parallel neural network streams:

The clean stream: This network is trained on clean (correctly labeled) samples to learn robust feature representations.
The noisy stream: This network is trained on noisy (potentially incorrectly labeled) samples, but it leverages the knowledge distilled from the clean stream to learn effectively despite the label noise.

The clean stream acts as a teacher, providing guidance to the noisy stream through a distillation process. This allows the noisy stream to learn discriminative features despite the corruption in the training labels.

The authors demonstrate the effectiveness of TSSD through extensive experiments on several benchmark image classification datasets with varying levels of label noise. They show that TSSD significantly outperforms other state-of-the-art approaches, such as Pairwise Similarity Distribution Clustering, Dual Branch Knowledge Distillation, and Data Stream Sampling.

Critical Analysis

The paper provides a compelling solution to the problem of learning from noisy labels, which is a common challenge in many real-world machine learning applications. The authors have carefully designed the TSSD framework and demonstrated its effectiveness through thorough experiments.

One potential limitation of the approach is that it requires access to clean training samples, which may not always be available in practice. The authors acknowledge this and suggest exploring self-supervised learning techniques, such as Self-Supervised Dataset Distillation and Label Revision, to address this issue.

Additionally, the paper does not provide extensive analysis on the computational and memory requirements of the TSSD framework, which could be an important consideration for practical deployment, especially in resource-constrained environments.

Conclusion

The "Robust Noisy Label Learning via Two-Stream Sample Distillation" paper presents a novel and effective approach to handling label noise in image classification tasks. By leveraging a two-stream architecture that learns robust features from clean samples and distills that knowledge into noisy samples, the researchers have demonstrated significant performance improvements over existing state-of-the-art methods.

This work is an important contribution to the field of machine learning, as it addresses a common and challenging problem that has real-world implications. The techniques developed in this paper could potentially be applied to a wide range of applications, from medical image analysis to autonomous vehicle perception, where the ability to learn effectively from noisy data is crucial.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔗

Pairwise Similarity Distribution Clustering for Noisy Label Learning

Sihan Bai

Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels, whose main challenge comes from how to deal with the inaccurate supervision caused by wrong labels. Existing works either take the label correction or sample selection paradigm to involve more samples with accurate labels into the training process. In this paper, we propose a simple yet effective sample selection algorithm, termed as Pairwise Similarity Distribution Clustering~(PSDC), to divide the training samples into one clean set and another noisy set, which can power any of the off-the-shelf semi-supervised learning regimes to further train networks for different downstream tasks. Specifically, we take the pairwise similarity between sample pairs to represent the sample structure, and the Gaussian Mixture Model~(GMM) to model the similarity distribution between sample pairs belonging to the same noisy cluster, therefore each sample can be confidently divided into the clean set or noisy set. Even under severe label noise rate, the resulting data partition mechanism has been proved to be more robust in judging the label confidence in both theory and practice. Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.

4/3/2024

cs.LG cs.CV

🗣️

Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection

Cunhang Fan, Mingming Ding, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv

Most research in synthetic speech detection (SSD) focuses on improving performance on standard noise-free datasets. However, in actual situations, noise interference is usually present, causing significant performance degradation in SSD systems. To improve noise robustness, this paper proposes a dual-branch knowledge distillation synthetic speech detection (DKDSSD) method. Specifically, a parallel data flow of the clean teacher branch and the noisy student branch is designed, and interactive fusion module and response-based teacher-student paradigms are proposed to guide the training of noisy data from both the data distribution and decision-making perspectives. In the noisy student branch, speech enhancement is introduced initially for denoising, aiming to reduce the interference of strong noise. The proposed interactive fusion combines denoised features and noisy features to mitigate the impact of speech distortion and ensure consistency with the data distribution of the clean branch. The teacher-student paradigm maps the student's decision space to the teacher's decision space, enabling noisy speech to behave similarly to clean speech. Additionally, a joint training method is employed to optimize both branches for achieving global optimality. Experimental results based on multiple datasets demonstrate that the proposed method performs effectively in noisy environments and maintains its performance in cross-dataset experiments. Source code is available at https://github.com/fchest/DKDSSD.

4/17/2024

cs.SD eess.AS

📊

Data Stream Sampling with Fuzzy Task Boundaries and Noisy Labels

Yu-Hsi Chen

In the realm of continual learning, the presence of noisy labels within data streams represents a notable obstacle to model reliability and fairness. We focus on the data stream scenario outlined in pertinent literature, characterized by fuzzy task boundaries and noisy labels. To address this challenge, we introduce a novel and intuitive sampling method called Noisy Test Debiasing (NTD) to mitigate noisy labels in evolving data streams and establish a fair and robust continual learning algorithm. NTD is straightforward to implement, making it feasible across various scenarios. Our experiments benchmark four datasets, including two synthetic noise datasets (CIFAR10 and CIFAR100) and real-world noise datasets (mini-WebVision and Food-101N). The results validate the efficacy of NTD for online continual learning in scenarios with noisy labels in data streams. Compared to the previous leading approach, NTD achieves a training speedup enhancement over two times while maintaining or surpassing accuracy levels. Moreover, NTD utilizes less than one-fifth of the GPU memory resources compared to previous leading methods.

4/9/2024

cs.LG cs.AI cs.CV

Improve Knowledge Distillation via Label Revision and Data Selection

Weichao Lan, Yiu-ming Cheung, Qing Xu, Buhua Liu, Zhikai Hu, Mengke Li, Zhenghua Chen

Knowledge distillation (KD) has become a widely used technique in the field of model compression, which aims to transfer knowledge from a large teacher model to a lightweight student model for efficient network development. In addition to the supervision of ground truth, the vanilla KD method regards the predictions of the teacher as soft labels to supervise the training of the student model. Based on vanilla KD, various approaches have been developed to further improve the performance of the student model. However, few of these previous methods have considered the reliability of the supervision from teacher models. Supervision from erroneous predictions may mislead the training of the student model. This paper therefore proposes to tackle this problem from two aspects: Label Revision to rectify the incorrect supervision and Data Selection to select appropriate samples for distillation to reduce the impact of erroneous supervision. In the former, we propose to rectify the teacher's inaccurate predictions using the ground truth. In the latter, we introduce a data selection technique to choose suitable training samples to be supervised by the teacher, thereby reducing the impact of incorrect predictions to some extent. Experiment results demonstrate the effectiveness of our proposed method, and show that our method can be combined with other distillation approaches, improving their performance.

4/8/2024

cs.LG cs.AI