Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

Read original: arXiv:2408.04813 - Published 8/12/2024 by Yingfan Ma, Xiaoyuan Luo, Mingzhi Yuan, Xinrong Chen, Manning Wang

Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

Overview

The paper "Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training" proposes a novel approach to multiple instance learning (MIL) using weakly-supervised self-training.
The key idea is to train an instance-level classifier directly from weakly-labeled data, without needing to explicitly model the bag-level labels.
This contrasts with traditional MIL methods that focus on predicting bag-level labels.

Plain English Explanation

In machine learning, there are often situations where we have data that is

weakly labeled

. This means we don't have detailed information about individual data points, but rather have a higher-level label for a "bag" of data points.

For example, imagine trying to classify microscope images of cells as cancerous or non-cancerous. The full dataset might consist of "bags" of multiple cell images, where the bag as a whole is labeled as cancerous or not. However, we don't know which individual cells within the bag are actually cancerous.

This is the setting of multiple instance learning (MIL), where the goal is to learn a model that can predict the label of the overall bag, even though we don't have labels for the individual instances (cells) inside.

Traditional MIL methods focus on modeling the relationship between the bag-level labels and the unknown instance-level labels. However, this paper proposes a different approach - training an instance-level classifier

directly

from the weakly-labeled data, using a technique called weakly-supervised self-training.

The key insight is that by iteratively refining the instance-level predictions, we can gradually learn an accurate instance-level classifier without ever explicitly modeling the bag-level labels. This can be more efficient and effective than the traditional MIL approach.

Technical Explanation

The paper proposes a weakly-supervised self-training approach for MIL, which involves the following steps:

Initialize an instance-level classifier: The authors start by training an initial instance-level classifier using the weakly-labeled data.
Iteratively refine the instance-level predictions: In each iteration, the instance-level classifier is used to make predictions on the individual instances within each bag. These predictions are then used to update the classifier, effectively "self-training" it.
Aggregate instance-level predictions to obtain bag-level labels: Once the instance-level classifier is trained, the bag-level labels can be obtained by aggregating the instance-level predictions within each bag (e.g., using max-pooling).

The authors demonstrate the effectiveness of this approach on several MIL benchmark datasets, showing that it can outperform traditional MIL methods in terms of both instance-level and bag-level prediction accuracy.

Critical Analysis

The paper presents a novel and promising approach to MIL that avoids the complexity of explicitly modeling the relationship between instance-level and bag-level labels. However, a few potential limitations or areas for further research are worth noting:

Sensitivity to initialization: The performance of the self-training approach may depend on the quality of the initial instance-level classifier. The authors do not explore the impact of different initialization strategies.
Potential overfitting: By iteratively refining the instance-level predictions, the model may risk overfitting to the training data. The authors do not provide a thorough analysis of the generalization capabilities of the approach.
Applicability to complex data: The experiments in the paper focus on relatively simple MIL datasets. Further research is needed to understand how the approach would scale to more complex, high-dimensional data, such as in medical imaging or natural language processing.
Interpretability: The self-training approach may produce a less interpretable instance-level classifier compared to traditional MIL methods that explicitly model the instance-to-bag relationship.

Conclusion

This paper presents a novel and promising approach to multiple instance learning, which sidesteps the complexity of traditional MIL methods by training an instance-level classifier directly from weakly-labeled data using self-training. The results demonstrate the effectiveness of this approach on several benchmark datasets, and the authors have made their code publicly available, which should facilitate further research and adoption of the method.

While the paper highlights several interesting aspects of this approach, further exploration of its limitations and potential extensions could help solidify its role in the MIL landscape and unlock its full potential for real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

Yingfan Ma, Xiaoyuan Luo, Mingzhi Yuan, Xinrong Chen, Manning Wang

Multiple instance learning (MIL) problem is currently solved from either bag-classification or instance-classification perspective, both of which ignore important information contained in some instances and result in limited performance. For example, existing methods often face difficulty in learning hard positive instances. In this paper, we formulate MIL as a semi-supervised instance classification problem, so that all the labeled and unlabeled instances can be fully utilized to train a better classifier. The difficulty in this formulation is that all the labeled instances are negative in MIL, and traditional self-training techniques used in semi-supervised learning tend to degenerate in generating pseudo labels for the unlabeled instances in this scenario. To resolve this problem, we propose a weakly-supervised self-training method, in which we utilize the positive bag labels to construct a global constraint and a local constraint on the pseudo labels to prevent them from degenerating and force the classifier to learn hard positive instances. It is worth noting that easy positive instances are instances are far from the decision boundary in the classification process, while hard positive instances are those close to the decision boundary. Through iterative optimization, the pseudo labels can gradually approach the true labels. Extensive experiments on two MNIST synthetic datasets, five traditional MIL benchmark datasets and two histopathology whole slide image datasets show that our method achieved new SOTA performance on all of them. The code will be publicly available.

8/12/2024

Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need

Linhao Qu, Yingfan Ma, Xiaoyuan Luo, Manning Wang, Zhijian Song

Weakly supervised whole slide image classification is usually formulated as a multiple instance learning (MIL) problem, where each slide is treated as a bag, and the patches cut out of it are treated as instances. Existing methods either train an instance classifier through pseudo-labeling or aggregate instance features into a bag feature through attention mechanisms and then train a bag classifier, where the attention scores can be used for instance-level classification. However, the pseudo instance labels constructed by the former usually contain a lot of noise, and the attention scores constructed by the latter are not accurate enough, both of which affect their performance. In this paper, we propose an instance-level MIL framework based on contrastive learning and prototype learning to effectively accomplish both instance classification and bag classification tasks. To this end, we propose an instance-level weakly supervised contrastive learning algorithm for the first time under the MIL setting to effectively learn instance feature representation. We also propose an accurate pseudo label generation method through prototype learning. We then develop a joint training strategy for weakly supervised contrastive learning, prototype learning, and instance classifier training. Extensive experiments and visualizations on four datasets demonstrate the powerful performance of our method. Codes are available at https://github.com/miccaiif/INS.

5/14/2024

🖼️

SC-MIL: Sparsely Coded Multiple Instance Learning for Whole Slide Image Classification

Peijie Qiu, Pan Xiao, Wenhui Zhu, Yalin Wang, Aristeidis Sotiras

Multiple Instance Learning (MIL) has been widely used in weakly supervised whole slide image (WSI) classification. Typical MIL methods include a feature embedding part, which embeds the instances into features via a pre-trained feature extractor, and an MIL aggregator that combines instance embeddings into predictions. Most efforts have typically focused on improving these parts. This involves refining the feature embeddings through self-supervised pre-training as well as modeling the correlations between instances separately. In this paper, we proposed a sparsely coding MIL (SC-MIL) method that addresses those two aspects at the same time by leveraging sparse dictionary learning. The sparse dictionary learning captures the similarities of instances by expressing them as sparse linear combinations of atoms in an over-complete dictionary. In addition, imposing sparsity improves instance feature embeddings by suppressing irrelevant instances while retaining the most relevant ones. To make the conventional sparse coding algorithm compatible with deep learning, we unrolled it into a sparsely coded module leveraging deep unrolling. The proposed SC module can be incorporated into any existing MIL framework in a plug-and-play manner with an acceptable computational cost. The experimental results on multiple datasets demonstrated that the proposed SC module could substantially boost the performance of state-of-the-art MIL methods. The codes are available at href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.

8/2/2024

MergeUp-augmented Semi-Weakly Supervised Learning for WSI Classification

Mingxi Ouyang, Yuqiu Fu, Renao Yan, ShanShan Shi, Xitong Ling, Lianghui Zhu, Yonghong He, Tian Guan

Recent advancements in computational pathology and artificial intelligence have significantly improved whole slide image (WSI) classification. However, the gigapixel resolution of WSIs and the scarcity of manual annotations present substantial challenges. Multiple instance learning (MIL) is a promising weakly supervised learning approach for WSI classification. Recently research revealed employing pseudo bag augmentation can encourage models to learn various data, thus bolstering models' performance. While directly inheriting the parents' labels can introduce more noise by mislabeling in training. To address this issue, we translate the WSI classification task from weakly supervised learning to semi-weakly supervised learning, termed SWS-MIL, where adaptive pseudo bag augmentation (AdaPse) is employed to assign labeled and unlabeled data based on a threshold strategy. Using the student-teacher pattern, we introduce a feature augmentation technique, MergeUp, which merges bags with low-priority bags to enhance inter-category information, increasing training data diversity. Experimental results on the CAMELYON-16, BRACS, and TCGA-LUNG datasets demonstrate the superiority of our method over existing state-of-the-art approaches, affirming its efficacy in WSI classification.

8/26/2024