Meta-learning for Positive-unlabeled Classification

2406.03680

Published 6/7/2024 by Atsutoshi Kumagai, Tomoharu Iwata, Yasuhiro Fujiwara

Meta-learning for Positive-unlabeled Classification

Abstract

We propose a meta-learning method for positive and unlabeled (PU) classification, which improves the performance of binary classifiers obtained from only PU data in unseen target tasks. PU learning is an important problem since PU data naturally arise in real-world applications such as outlier detection and information retrieval. Existing PU learning methods require many PU data, but sufficient data are often unavailable in practice. The proposed method minimizes the test classification risk after the model is adapted to PU data by using related tasks that consist of positive, negative, and unlabeled data. We formulate the adaptation as an estimation problem of the Bayes optimal classifier, which is an optimal classifier to minimize the classification risk. The proposed method embeds each instance into a task-specific space using neural networks. With the embedded PU data, the Bayes optimal classifier is estimated through density-ratio estimation of PU densities, whose solution is obtained as a closed-form solution. The closed-form solution enables us to efficiently and effectively minimize the test classification risk. We empirically show that the proposed method outperforms existing methods with one synthetic and three real-world datasets.

Create account to get full access

Overview

This paper introduces a meta-learning approach for positive-unlabeled (PU) classification tasks.
PU classification is a type of semi-supervised learning where only positive (relevant) examples are labeled, and the rest of the data is unlabeled.
The proposed method aims to learn a model that can effectively classify positive and negative examples from this limited labeled data.

Plain English Explanation

In many real-world machine learning problems, we often have access to only a small set of labeled data, while the majority of the data remains unlabeled. This is a common scenario in areas like disease classification based on limited medical data or positive-unlabeled (PU) learning, where we know some examples belong to the positive (relevant) class, but we don't have information about the negative (irrelevant) class.

The paper proposes a meta-learning approach to tackle this PU classification challenge. The key idea is to train a model that can quickly adapt to new PU classification tasks by learning from a diverse set of similar tasks during a meta-training phase. This allows the model to leverage the inherent structure and patterns in PU data, even when only a small amount of labeled data is available for a specific task.

The authors draw inspiration from soft-label PU learning and PUAL classifier techniques, which have shown promise in PU classification. Their meta-learning approach aims to further improve the performance of these methods by enabling the model to rapidly adapt to new PU tasks.

Technical Explanation

The paper introduces a meta-learning framework for PU classification tasks. The core idea is to train a model that can quickly adapt to new PU classification problems by learning from a diverse set of similar tasks during a meta-training phase.

The authors formulate the PU classification problem as a bi-level optimization task, where the inner loop learns a classifier for a specific PU task, and the outer loop updates the meta-model parameters to facilitate fast adaptation to new tasks. This meta-learning approach allows the model to leverage the inherent structure and patterns in PU data, even when only a small amount of labeled data is available for a specific task.

The proposed method draws inspiration from soft-label PU learning and PUAL classifier techniques, which have shown promise in PU classification. The authors further extend these ideas by incorporating a meta-learning component, which enables the model to rapidly adapt to new PU tasks.

The paper also explores the use of positive-label is all you need and semi-supervised disease classification techniques to improve the performance of the proposed meta-learning approach.

Critical Analysis

The paper presents a promising meta-learning approach for PU classification tasks, which addresses the challenge of learning from limited labeled data. The authors demonstrate the effectiveness of their method through extensive experiments on various benchmark datasets.

One potential limitation of the approach is the computational complexity involved in the bi-level optimization process, which may pose challenges for large-scale or real-time applications. Additionally, the paper does not explore the robustness of the method to noisy or imbalanced data, which are common issues in real-world PU classification problems.

Further research could investigate ways to reduce the computational burden of the meta-learning process, as well as explore the performance of the proposed method in the presence of noisy or imbalanced data. Comparisons to other state-of-the-art PU classification techniques, such as adversarial PU learning, could also provide additional insights into the strengths and limitations of the proposed approach.

Conclusion

This paper introduces a meta-learning framework for positive-unlabeled (PU) classification tasks, where only a small set of labeled data is available. The proposed method leverages the inherent structure and patterns in PU data to enable rapid adaptation to new tasks, achieving improved performance compared to traditional PU classification techniques.

The work builds upon promising ideas from soft-label PU learning and PUAL classifier, and explores the use of positive-label is all you need and semi-supervised disease classification techniques to further enhance the performance of the meta-learning approach.

The proposed method represents a significant advancement in the field of PU classification, with the potential to unlock new opportunities in areas where labeled data is scarce. As the authors continue to refine and expand their work, it will be exciting to see how this meta-learning approach can be applied to real-world challenges and drive further progress in machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

⛏️

Positive Unlabeled Contrastive Learning

Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon

Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from limited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative). We first propose a simple extension of standard infoNCE family of contrastive losses, to the PU setting; and show that this learns superior representations, as compared to existing unsupervised and supervised approaches. We then develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme; these pseudo-labels can then be used to train the final (positive vs. negative) classifier. Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets, while not requiring a-priori knowledge of any class prior (which is a common assumption in other PU methods). We also provide a simple theoretical analysis that motivates our methods.

4/1/2024

cs.LG cs.AI

📶

Soft Label PU Learning

Puning Zhao, Jintao Deng, Xu Cheng

PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels according to their probabilities of being positive. Considering that the ground truth of TPR, FPR, and AUC are unknown, we then design PU counterparts of these metrics to evaluate the performances of soft label PU learning methods within validation data. We show that these new designed PU metrics are good substitutes for the real metrics. After that, a method that optimizes such metrics is proposed. Experiments on public datasets and real datasets for anti-cheat services from Tencent games demonstrate the effectiveness of our proposed method.

5/6/2024

cs.LG

PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Xiaoke Wang, Xiaochen Yang, Rui Zhu, Jing-Hao Xue

Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with asymmetric loss (PUAL), by introducing a structure of asymmetric loss on positive instances into the objective function of the global and local learning classifier. Then we develop a kernel-based algorithm to enable PUAL to obtain non-linear decision boundary. We show that, through experiments on both simulated and real-world datasets, PUAL can achieve satisfactory classification on trifurcate data.

6/3/2024

stat.ML cs.LG

Positive Label Is All You Need for Multi-Label Classification

Zhixiang Yuan, Kaixin Zhang, Tao Huang

Multi-label classification (MLC) faces challenges from label noise in training data due to annotating diverse semantic labels for each image. Current methods mainly target identifying and correcting label mistakes using trained MLC models, but still struggle with persistent noisy labels during training, resulting in imprecise recognition and reduced performance. Our paper addresses label noise in MLC by introducing a positive and unlabeled multi-label classification (PU-MLC) method. To counteract noisy labels, we directly discard negative labels, focusing on the abundance of negative labels and the origin of most noisy labels. PU-MLC employs positive-unlabeled learning, training the model with only positive labels and unlabeled data. The method incorporates adaptive re-balance factors and temperature coefficients in the loss function to address label distribution imbalance and prevent over-smoothing of probabilities during training. Additionally, we introduce a local-global convolution module to capture both local and global dependencies in the image without requiring backbone retraining. PU-MLC proves effective on MLC and MLC with partial labels (MLC-PL) tasks, demonstrating significant improvements on MS-COCO and PASCAL VOC datasets with fewer annotations. Code is available at: https://github.com/TAKELAMAG/PU-MLC.

4/17/2024

cs.CV