Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification

Read original: arXiv:2408.01167 - Published 8/6/2024 by Bryan Wong, Mun Yong Yi

Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification

Overview

This paper explores how to effectively leverage pre-trained feature extractors in multiple instance learning (MIL) for whole slide image (WSI) classification.
MIL is a machine learning paradigm well-suited for WSI analysis, as it can handle the large size and complex structure of WSIs.
The researchers investigate the impact of different pre-trained feature extractors, including modern backbones and foundation models, on MIL performance for WSI classification.

Plain English Explanation

Whole slide images (WSIs) are high-resolution digital scans of entire tissue samples, which are commonly used in medical diagnosis and pathology. These WSIs can be incredibly large and complex, making them challenging to analyze using traditional machine learning techniques.

Multiple instance learning (MIL) is a machine learning approach that is well-suited for handling WSIs. MIL treats the entire WSI as a "bag" of smaller regions, called "instances," and tries to classify the entire WSI based on the properties of these instances.

In this paper, the researchers investigate how to effectively use pre-trained feature extractors, such as popular deep learning models like ResNet or foundation models like CLIP, to improve the performance of MIL for WSI classification. The choice of pre-trained feature extractor can have a significant impact on the final classification accuracy, so the researchers explore different options to find the best approach.

Technical Explanation

The researchers conduct extensive experiments to evaluate the performance of various pre-trained feature extractors in the context of MIL for WSI classification. They consider a range of modern backbone architectures, such as ResNet, VGG, and EfficientNet, as well as foundation models like CLIP, which are pre-trained on large, diverse datasets.

The key steps in their approach are:

Feature Extraction: The researchers extract features from WSI patches using the pre-trained feature extractors.
MIL Pooling: The extracted features are then pooled using different MIL pooling strategies, such as max pooling or attention-based pooling, to obtain a single feature representation for the entire WSI.
Classification: The pooled features are then used to train a final classifier, such as a linear layer, to predict the label of the WSI.

The researchers evaluate their approach on multiple WSI classification datasets and compare the performance of different pre-trained feature extractors. Their results show that the choice of pre-trained feature extractor can have a significant impact on the final classification accuracy, with foundation models like CLIP often outperforming traditional backbone architectures.

Critical Analysis

The paper provides a comprehensive evaluation of the impact of pre-trained feature extractors on MIL for WSI classification, which is an important and practical problem in digital pathology. The researchers consider a diverse set of pre-trained models, including modern backbones and foundation models, which is a strength of the study.

However, the paper does not address several potential limitations and areas for further research:

Dataset Bias: The performance of pre-trained feature extractors can be heavily influenced by the datasets used for pre-training. The researchers do not discuss the potential biases or limitations of the datasets used to pre-train the models they evaluate.
Interpretability: The paper focuses solely on classification performance and does not explore the interpretability or explainability of the different pre-trained feature extractors. Understanding the underlying features and decision-making process could be valuable for clinical applications.
Computational Efficiency: The paper does not consider the computational efficiency or inference time of the different pre-trained feature extractors, which can be an important practical consideration for deployment in real-world scenarios.

Future research could address these limitations by investigating dataset bias, incorporating interpretability analysis, and evaluating computational efficiency alongside classification performance.

Conclusion

This paper presents a comprehensive study on the impact of pre-trained feature extractor selection in the context of multiple instance learning for whole slide image classification. The researchers demonstrate that the choice of pre-trained feature extractor can have a significant influence on the final classification accuracy, with foundation models like CLIP often outperforming traditional backbone architectures.

The findings of this work have important implications for the development of robust and efficient MIL-based WSI analysis systems, which are crucial for advancing digital pathology and improving medical diagnosis. The insights provided in this paper can guide researchers and practitioners in selecting appropriate pre-trained feature extractors to optimize the performance of their MIL-based WSI classification models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification

Bryan Wong, Mun Yong Yi

Multiple instance learning (MIL) has become a preferred method for classifying gigapixel whole slide images (WSIs), without requiring patch label annotation. The focus of the current MIL research stream is on the embedding-based MIL approach, which involves extracting feature vectors from patches using a pre-trained feature extractor. These feature vectors are then fed into an MIL aggregator for slide-level prediction. Despite prior research suggestions on enhancing the most commonly used ResNet50 supervised model pre-trained on ImageNet-1K, there remains a lack of clear guidance on selecting the optimal feature extractor to maximize WSI performance. This study aims at addressing this gap by examining MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method. Extensive experiments were carried out on the two public WSI datasets (TCGA-NSCLC and Camelyon16) using four SOTA MIL models. The main findings indicate the following: 1) Performance significantly improves with larger and more varied pre-training datasets in both CNN and Transformer backbones. 2) `Modern and deeper' backbones greatly outperform `standard' backbones (ResNet and ViT), with performance improvements more guaranteed in Transformer-based backbones. 3) The choice of self-supervised learning (SSL) method is crucial, with the most significant benefits observed when applied to the Transformer (ViT) backbone. The study findings have practical implications, including designing more effective pathological foundation models. Our code is available at: https://anonymous.4open.science/r/MIL-Feature-Extractor-Selection

8/6/2024

Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need

Linhao Qu, Yingfan Ma, Xiaoyuan Luo, Manning Wang, Zhijian Song

Weakly supervised whole slide image classification is usually formulated as a multiple instance learning (MIL) problem, where each slide is treated as a bag, and the patches cut out of it are treated as instances. Existing methods either train an instance classifier through pseudo-labeling or aggregate instance features into a bag feature through attention mechanisms and then train a bag classifier, where the attention scores can be used for instance-level classification. However, the pseudo instance labels constructed by the former usually contain a lot of noise, and the attention scores constructed by the latter are not accurate enough, both of which affect their performance. In this paper, we propose an instance-level MIL framework based on contrastive learning and prototype learning to effectively accomplish both instance classification and bag classification tasks. To this end, we propose an instance-level weakly supervised contrastive learning algorithm for the first time under the MIL setting to effectively learn instance feature representation. We also propose an accurate pseudo label generation method through prototype learning. We then develop a joint training strategy for weakly supervised contrastive learning, prototype learning, and instance classifier training. Extensive experiments and visualizations on four datasets demonstrate the powerful performance of our method. Codes are available at https://github.com/miccaiif/INS.

5/14/2024

Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification

Xin Liu, Weijia Zhang, Min-Ling Zhang

Although attention-based multi-instance learning algorithms have achieved impressive performances on slide-level whole slide image (WSI) classification tasks, they are prone to mistakenly focus on irrelevant patterns such as staining conditions and tissue morphology, leading to incorrect patch-level predictions and unreliable interpretability. Moreover, these attention-based MIL algorithms tend to focus on salient instances and struggle to recognize hard-to-classify instances. In this paper, we first demonstrate that attention-based WSI classification methods do not adhere to the standard MIL assumptions. From the standard MIL assumptions, we propose a surprisingly simple yet effective instance-based MIL method for WSI classification (FocusMIL) based on max-pooling and forward amortized variational inference. We argue that synergizing the standard MIL assumption with variational inference encourages the model to focus on tumour morphology instead of spurious correlations. Our experimental evaluations show that FocusMIL significantly outperforms the baselines in patch-level classification tasks on the Camelyon16 and TCGA-NSCLC benchmarks. Visualization results show that our method also achieves better classification boundaries for identifying hard instances and mitigates the effect of spurious correlations between bags and labels.

8/20/2024

🖼️

Whole Slide Image Survival Analysis Using Histopathological Feature Extractors

Kleanthis Marios Papadopoulos

The abundance of information present in Whole Slide Images (WSIs) makes them useful for prognostic evaluation. A large number of models utilizing a pretrained ResNet backbone have been released and employ various feature aggregation techniques, primarily based on Multiple Instance Learning (MIL). By leveraging the recently released UNI feature extractor, existing models can be adapted to achieve higher accuracy, which paves the way for more robust prognostic tools in digital pathology.

5/29/2024