Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Read original: arXiv:2311.07125 - Published 7/8/2024 by Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang

🖼️

Overview

The researchers present a new method called Attention-Challenging MIL (ACMIL) to address overfitting issues in Whole Slide Image (WSI) classification using Multiple Instance Learning (MIL) approaches.
ACMIL combines two techniques to mitigate attention value concentration, which is a common problem in attention-based MIL models.
The first technique, Multiple Branch Attention (MBA), captures more discriminative instances by using multiple attention branches.
The second technique, Stochastic Top-K Instance Masking (STKIM), masks out a portion of instances with high attention values and redistributes their attention to the remaining instances.

Plain English Explanation

When using Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a small subset of important instances, which can lead to overfitting. To address this issue, the researchers developed a new approach called Attention-Challenging MIL (ACMIL).

ACMIL combines two main techniques. First, the researchers used UMAP (Uniform Manifold Approximation and Projection) to visualize the instance features and found that existing attention mechanisms only capture a limited set of discriminative patterns. To solve this, they introduced Multiple Branch Attention (MBA), which uses multiple attention branches to identify a wider range of discriminative instances.

Second, the researchers examined the cumulative attention scores and found that a small number of instances receive the majority of the attention. To address this, they developed a technique called Stochastic Top-K Instance Masking (STKIM). STKIM randomly masks out a portion of the instances with the highest attention scores and redistributes their attention to the remaining instances. This helps to prevent the model from overfocusing on a few key instances.

The researchers tested ACMIL on three different WSI datasets using two pre-trained backbones and found that it outperformed state-of-the-art methods. The visualizations and analyses they provided also demonstrate ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge.

Technical Explanation

The researchers present Attention-Challenging MIL (ACMIL), a new approach to address the overfitting issue in Whole Slide Image (WSI) classification using Multiple Instance Learning (MIL) methods. Existing attention-based MIL models often focus on a subset of discriminative instances, which is closely linked to overfitting.

To mitigate this problem, ACMIL combines two key techniques based on separate analyses of attention value concentration. First, the researchers used UMAP [1] to visualize the instance features and found that existing attention mechanisms only capture a limited set of discriminative patterns. To address this, they introduced Multiple Branch Attention (MBA), which uses multiple attention branches to identify a wider range of discriminative instances.

Second, the researchers examined the cumulative value of Top-K attention scores and discovered that a tiny number of instances dominate the majority of attention. In response, they present Stochastic Top-K Instance Masking (STKIM), which randomly masks out a portion of instances with Top-K attention values and allocates their attention values to the remaining instances.

The extensive experimental results on three WSI datasets with two pre-trained backbones reveal that ACMIL outperforms state-of-the-art methods, such as [2], [3], and [4]. Additionally, the researchers provide heatmap and UMAP visualizations to illustrate ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge.

[1] Finding Regions of Interest in Whole Slide Images Using Uniform Manifold Approximation and Projection [2] Generalizable Whole-Slide Image Classification Using Fine-Grained Attention Networks [3] AMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation [4] GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Remote Sensing Image Segmentation

Critical Analysis

The researchers have identified a crucial problem in the application of MIL methods for WSI classification, namely the attention value concentration and its link to overfitting. Their proposed ACMIL approach, which combines multiple attention branches and stochastic top-k instance masking, appears to be an effective solution to this problem.

One potential limitation of the study is that it only evaluates ACMIL on three WSI datasets and two pre-trained backbones. While the results are promising, further testing on a wider range of datasets and architectures would help to validate the generalizability of the method.

Additionally, the paper could have provided more details on the computational complexity and training time of ACMIL compared to the baseline methods. This information would be useful for practitioners looking to implement the technique in their own projects.

Overall, the researchers have presented a well-designed and thoughtful solution to a significant challenge in WSI classification using MIL. The extensive visualizations and analyses help to demonstrate the effectiveness of ACMIL, and the open-source availability of the code is a welcome addition. However, further research on the scalability and generalizability of the approach would strengthen the conclusions.

Conclusion

The Attention-Challenging MIL (ACMIL) method developed by the researchers addresses a crucial issue in the application of Multiple Instance Learning (MIL) for Whole Slide Image (WSI) classification. By combining Multiple Branch Attention (MBA) and Stochastic Top-K Instance Masking (STKIM), ACMIL effectively suppresses attention value concentration and overcomes the overfitting challenge.

The extensive experimental results and visualizations provided in the paper showcase the effectiveness of ACMIL in outperforming state-of-the-art methods. This research represents an important step forward in improving the robustness and reliability of MIL-based WSI classification, which has significant implications for medical image analysis and digital pathology.

While further testing on a broader range of datasets and architectures would be beneficial, the ACMIL approach demonstrated in this paper is a promising solution to a longstanding problem in the field. The open-source availability of the code also allows other researchers and practitioners to build upon this work and continue advancing the state of the art in WSI classification.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang

In the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a subset of discriminative instances, which are closely linked to overfitting. To mitigate overfitting, we present Attention-Challenging MIL (ACMIL). ACMIL combines two techniques based on separate analyses for attention value concentration. Firstly, UMAP of instance features reveals various patterns among discriminative instances, with existing attention mechanisms capturing only some of them. To remedy this, we introduce Multiple Branch Attention (MBA) to capture more discriminative instances using multiple attention branches. Secondly, the examination of the cumulative value of Top-K attention scores indicates that a tiny number of instances dominate the majority of attention. In response, we present Stochastic Top-K Instance Masking (STKIM), which masks out a portion of instances with Top-K attention values and allocates their attention values to the remaining instances. The extensive experimental results on three WSI datasets with two pre-trained backbones reveal that our ACMIL outperforms state-of-the-art methods. Additionally, through heatmap visualization and UMAP visualization, this paper extensively illustrates ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge. The source code is available at url{https://github.com/dazhangyu123/ACMIL}.

7/8/2024

Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification

Xin Liu, Weijia Zhang, Min-Ling Zhang

Although attention-based multi-instance learning algorithms have achieved impressive performances on slide-level whole slide image (WSI) classification tasks, they are prone to mistakenly focus on irrelevant patterns such as staining conditions and tissue morphology, leading to incorrect patch-level predictions and unreliable interpretability. Moreover, these attention-based MIL algorithms tend to focus on salient instances and struggle to recognize hard-to-classify instances. In this paper, we first demonstrate that attention-based WSI classification methods do not adhere to the standard MIL assumptions. From the standard MIL assumptions, we propose a surprisingly simple yet effective instance-based MIL method for WSI classification (FocusMIL) based on max-pooling and forward amortized variational inference. We argue that synergizing the standard MIL assumption with variational inference encourages the model to focus on tumour morphology instead of spurious correlations. Our experimental evaluations show that FocusMIL significantly outperforms the baselines in patch-level classification tasks on the Camelyon16 and TCGA-NSCLC benchmarks. Visualization results show that our method also achieves better classification boundaries for identifying hard instances and mitigates the effect of spurious correlations between bags and labels.

8/20/2024

🤿

Multi-head Attention-based Deep Multiple Instance Learning

Hassan Keshvarikhojasteh, Josien Pluim, Mitko Veta

This paper introduces MAD-MIL, a Multi-head Attention-based Deep Multiple Instance Learning model, designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology. Inspired by the multi-head attention mechanism of the Transformer, MAD-MIL simplifies model complexity while achieving competitive results against advanced models like CLAM and DS-MIL. Evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY, MAD-MIL consistently outperforms ABMIL. This demonstrates enhanced information diversity, interpretability, and efficiency in slide representation. The model's effectiveness, coupled with fewer trainable parameters and lower computational complexity makes it a promising solution for automated pathology workflows. Our code is available at https://github.com/tueimage/MAD-MIL.

4/9/2024

SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification

Heng Fang, Sheng Huang, Wenhao Tang, Luwen Huangfu, Bo Liu

Multiple Instance Learning (MIL) represents the predominant framework in Whole Slide Image (WSI) classification, covering aspects such as sub-typing, diagnosis, and beyond. Current MIL models predominantly rely on instance-level features derived from pretrained models such as ResNet. These models segment each WSI into independent patches and extract features from these local patches, leading to a significant loss of global spatial context and restricting the model's focus to merely local features. To address this issue, we propose a novel MIL framework, named SAM-MIL, that emphasizes spatial contextual awareness and explicitly incorporates spatial context by extracting comprehensive, image-level information. The Segment Anything Model (SAM) represents a pioneering visual segmentation foundational model that can capture segmentation features without the need for additional fine-tuning, rendering it an outstanding tool for extracting spatial context directly from raw WSIs. Our approach includes the design of group feature extraction based on spatial context and a SAM-Guided Group Masking strategy to mitigate class imbalance issues. We implement a dynamic mask ratio for different segmentation categories and supplement these with representative group features of categories. Moreover, SAM-MIL divides instances to generate additional pseudo-bags, thereby augmenting the training set, and introduces consistency of spatial context across pseudo-bags to further enhance the model's performance. Experimental results on the CAMELYON-16 and TCGA Lung Cancer datasets demonstrate that our proposed SAM-MIL model outperforms existing mainstream methods in WSIs classification. Our open-source implementation code is is available at https://github.com/FangHeng/SAM-MIL.

7/26/2024