Distilling High Diagnostic Value Patches for Whole Slide Image Classification Using Attention Mechanism

Read original: arXiv:2407.19821 - Published 8/19/2024 by Tianhang Nan, Hao Quan, Yong Ding, Xingyu Li, Kai Yang, Xiaoyu Cui

🖼️

Overview

Multiple Instance Learning (MIL) is a popular approach for classifying whole slide images (WSIs) in medical imaging.
MIL replaces pixel-level manual annotation with diagnostic reports as labels, reducing labor costs.
Bag-level MIL methods often perform better than patch-level methods, but can incorporate redundant patches.
To address this, the researchers developed an attention-based feature distillation MIL (AFD-MIL) approach.

Plain English Explanation

Multiple Instance Learning (MIL) is a way to classify whole slide images (WSIs) in medical imaging using diagnostic reports instead of manually annotating every pixel. This is a big advantage because it saves a lot of time and money.

Bag-level MIL methods generally work better than patch-level methods, because they can look at the entire WSI as a whole. However, these methods can also include a lot of redundant, irrelevant patches, which can interfere with the classification.

To address this problem, the researchers developed a new approach called attention-based feature distillation MIL (AFD-MIL). This method first excludes the redundant, interfering patches as a preprocessing step. Then, it uses attention mechanisms to focus on the patches that are most important for making the diagnosis. This helps extract the most valuable features from the WSI.

The researchers also introduced a global loss optimization technique to fine-tune the feature distillation process. This allows the model to better learn which features are truly diagnostic.

Overall, the AFD-MIL approach outperformed the current state-of-the-art methods on two different medical imaging datasets, demonstrating its effectiveness at extracting the most relevant information from WSIs.

Technical Explanation

The key innovation in the AFD-MIL approach is the exclusion of redundant patches as a preprocessing step, combined with the use of attention mechanisms to distill the most diagnostic features.

Traditionally, bag-level MIL methods have incorporated all patches from the WSI, even if many of them are redundant or irrelevant. This can lead to interference and reduced classification performance. The researchers addressed this by first filtering out the less informative patches before feeding the remaining patches into the MIL model.

Additionally, the researchers pioneered the use of attention mechanisms to selectively focus on the most relevant patch features, rather than indiscriminately combining all patch information. This "feature distillation" process allows the model to hone in on the most diagnostic aspects of the WSI.

The researchers also introduced a global loss optimization technique to fine-tune the feature distillation module. This helps ensure that the most important features are properly emphasized during training.

The AFD-MIL approach was evaluated on two medical imaging datasets: Camelyon16 (breast cancer) and TCGA-NSCLC (non-small cell lung cancer). Different feature distillation methods were used for each dataset, tailored to the specific disease characteristics. This resulted in significant performance improvements over the current state-of-the-art, achieving over 91% accuracy and 94% AUC on Camelyon16, and 93% accuracy and 98% AUC on TCGA-NSCLC.

Critical Analysis

The AFD-MIL approach represents an important advancement in whole slide image classification, as it addresses a key limitation of existing bag-level MIL methods. By proactively excluding redundant patches and using attention-based feature distillation, the model is able to focus on the most diagnostically relevant aspects of the WSI.

However, the paper does not provide a detailed analysis of the specific types of patches that are excluded or the criteria used to determine which patches are redundant. Additionally, while the attention mechanisms are shown to improve performance, the paper does not delve deeply into the interpretability of the attention weights or how they correlate with clinically meaningful features.

Further research could explore more sophisticated patch selection and feature distillation techniques, potentially incorporating domain-specific knowledge to better understand which image regions and characteristics are truly indicative of disease. Evaluating the approach on a broader range of medical imaging datasets would also help validate its generalizability.

Overall, the AFD-MIL method represents a promising step forward in leveraging weakly supervised learning for whole slide image analysis. By intelligently managing the input data and feature extraction process, the model is able to achieve state-of-the-art performance while providing a foundation for more interpretable and clinically-relevant WSI classification.

Conclusion

The AFD-MIL approach introduces an innovative way to perform whole slide image classification using multiple instance learning. By excluding redundant patches and distilling the most diagnostic features using attention mechanisms, the model is able to achieve superior performance on two medical imaging datasets.

This research highlights the importance of carefully managing the input data and feature extraction process when working with complex, high-dimensional medical images. The ability to leverage weakly supervised learning techniques like MIL, while still focusing on the most relevant image regions and characteristics, represents a significant advancement in the field of computational pathology.

The AFD-MIL method provides a strong foundation for future research into more interpretable and clinically-relevant WSI analysis, with potential applications in computer-aided diagnosis, digital pathology, and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Distilling High Diagnostic Value Patches for Whole Slide Image Classification Using Attention Mechanism

Tianhang Nan, Hao Quan, Yong Ding, Xingyu Li, Kai Yang, Xiaoyu Cui

Multiple Instance Learning (MIL) has garnered widespread attention in the field of Whole Slide Image (WSI) classification as it replaces pixel-level manual annotation with diagnostic reports as labels, significantly reducing labor costs. Recent research has shown that bag-level MIL methods often yield better results because they can consider all patches of the WSI as a whole. However, a drawback of such methods is the incorporation of more redundant patches, leading to interference. To extract patches with high diagnostic value while excluding interfering patches to address this issue, we developed an attention-based feature distillation multi-instance learning (AFD-MIL) approach. This approach proposed the exclusion of redundant patches as a preprocessing operation in weakly supervised learning, directly mitigating interference from extensive noise. It also pioneers the use of attention mechanisms to distill features with high diagnostic value, as opposed to the traditional practice of indiscriminately and forcibly integrating all patches. Additionally, we introduced global loss optimization to finely control the feature distillation module. AFD-MIL is orthogonal to many existing MIL methods, leading to consistent performance improvements. This approach has surpassed the current state-of-the-art method, achieving 91.47% ACC (accuracy) and 94.29% AUC (area under the curve) on the Camelyon16 (Camelyon Challenge 2016, breast cancer), while 93.33% ACC and 98.17% AUC on the TCGA-NSCLC (The Cancer Genome Atlas Program: non-small cell lung cancer). Different feature distillation methods were used for the two datasets, tailored to the specific diseases, thereby improving performance and interpretability.

8/19/2024

Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification

Xin Liu, Weijia Zhang, Min-Ling Zhang

Although attention-based multi-instance learning algorithms have achieved impressive performances on slide-level whole slide image (WSI) classification tasks, they are prone to mistakenly focus on irrelevant patterns such as staining conditions and tissue morphology, leading to incorrect patch-level predictions and unreliable interpretability. Moreover, these attention-based MIL algorithms tend to focus on salient instances and struggle to recognize hard-to-classify instances. In this paper, we first demonstrate that attention-based WSI classification methods do not adhere to the standard MIL assumptions. From the standard MIL assumptions, we propose a surprisingly simple yet effective instance-based MIL method for WSI classification (FocusMIL) based on max-pooling and forward amortized variational inference. We argue that synergizing the standard MIL assumption with variational inference encourages the model to focus on tumour morphology instead of spurious correlations. Our experimental evaluations show that FocusMIL significantly outperforms the baselines in patch-level classification tasks on the Camelyon16 and TCGA-NSCLC benchmarks. Visualization results show that our method also achieves better classification boundaries for identifying hard instances and mitigates the effect of spurious correlations between bags and labels.

8/20/2024

🖼️

Establishing Truly Causal Relationship Between Whole Slide Image Predictions and Diagnostic Evidence Subregions in Deep Learning

Tianhang Nan, Yong Ding, Hao Quan, Deliang Li, Mingchen Zou, Xiaoyu Cui

In the field of deep learning-driven Whole Slide Image (WSI) classification, Multiple Instance Learning (MIL) has gained significant attention due to its ability to be trained using only slide-level diagnostic labels. Previous MIL researches have primarily focused on enhancing feature aggregators for globally analyzing WSIs, but overlook a causal relationship in diagnosis: model's prediction should ideally stem solely from regions of the image that contain diagnostic evidence (such as tumor cells), which usually occupy relatively small areas. To address this limitation and establish the truly causal relationship between model predictions and diagnostic evidence regions, we propose Causal Inference Multiple Instance Learning (CI-MIL). CI-MIL integrates feature distillation with a novel patch decorrelation mechanism, employing a two-stage causal inference approach to distill and process patches with high diagnostic value. Initially, CI-MIL leverages feature distillation to identify patches likely containing tumor cells and extracts their corresponding feature representations. These features are then mapped to random Fourier feature space, where a learnable weighting scheme is employed to minimize inter-feature correlations, effectively reducing redundancy from homogenous patches and mitigating data bias. These processes strengthen the causal relationship between model predictions and diagnostically relevant regions, making the prediction more direct and reliable. Experimental results demonstrate that CI-MIL outperforms state-of-the-art methods. Additionally, CI-MIL exhibits superior interpretability, as its selected regions demonstrate high consistency with ground truth annotations, promising more reliable diagnostic assistance for pathologists.

7/25/2024

🖼️

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang

In the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a subset of discriminative instances, which are closely linked to overfitting. To mitigate overfitting, we present Attention-Challenging MIL (ACMIL). ACMIL combines two techniques based on separate analyses for attention value concentration. Firstly, UMAP of instance features reveals various patterns among discriminative instances, with existing attention mechanisms capturing only some of them. To remedy this, we introduce Multiple Branch Attention (MBA) to capture more discriminative instances using multiple attention branches. Secondly, the examination of the cumulative value of Top-K attention scores indicates that a tiny number of instances dominate the majority of attention. In response, we present Stochastic Top-K Instance Masking (STKIM), which masks out a portion of instances with Top-K attention values and allocates their attention values to the remaining instances. The extensive experimental results on three WSI datasets with two pre-trained backbones reveal that our ACMIL outperforms state-of-the-art methods. Additionally, through heatmap visualization and UMAP visualization, this paper extensively illustrates ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge. The source code is available at url{https://github.com/dazhangyu123/ACMIL}.

7/8/2024