An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Read original: arXiv:2301.02608 - Published 5/2/2024 by Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, Jo~ao Fraga, Ana Monteiro, Jo~ao Monteiro, Liliana Ribeiro, Sofia Gonc{c}alves, Stefan Reinhard and 3 others

🖼️

Overview

Developed a scalable AI system to diagnose colorectal cancer from whole-slide images (WSIs)
Proposed a deep learning (DL) system that learns from weak labels
Implemented a sampling strategy to reduce training samples by 6x without compromising performance
Leveraged a small subset of fully annotated samples
Created a prototype with explainable predictions, active learning features, and parallelization
Evaluated on one of the largest WSI colorectal cancer datasets with over 10,500 samples, including 900 testing samples
Assessed robustness on two external datasets (TCGA and PAIP) and a dataset from the proposed prototype

Plain English Explanation

The paper describes the development of an artificial intelligence (AI) system to help diagnose colorectal cancer using whole-slide images (WSIs) of tissue samples. Whole-slide imaging is a technique where a tissue sample is digitized into a high-resolution digital image that can be analyzed by computers. The researchers used a deep learning (DL) approach, which is a type of AI that can learn patterns from large datasets.

The key innovations of this system include:

Weak Labels: The DL system was trained using "weak labels," which means the training data only had partial information about the cancer severity, rather than detailed annotations. This makes the system more scalable, as it's easier to obtain weakly labeled data.
Sampling Strategy: The researchers developed a sampling strategy that reduced the number of training samples by 6 times without compromising the system's performance. This makes the training process more efficient.
Leveraging Annotated Samples: The system also leveraged a small subset of fully annotated samples to improve its performance.
Prototype Features: The researchers created a prototype of the system that can provide explainable predictions, allow for active learning (where the system learns from user feedback), and run in parallel to speed up the analysis.

The system was evaluated on one of the largest WSI colorectal cancer datasets, with over 10,500 samples, including 900 testing samples. It was also tested on two external datasets (TCGA and PAIP) to assess its robustness. The results showed high accuracy and sensitivity, even on the challenging external datasets.

Technical Explanation

The researchers developed a deep learning (DL) system to classify colorectal cancer severity in whole-slide images (WSIs). [The system learns to identify patterns in the images that correspond to different levels of cancer progression, similar to how deep learning-based systems can be used to predict breast cancer tumor characteristics from medical images.]

The key technical innovations include:

Weak Labels: The DL system was trained using "weak labels," which means the training data only had partial information about the cancer severity, rather than detailed annotations of the entire slide. This reduces the effort required to label the training data, making the system more scalable.
Sampling Strategy: The researchers developed a sampling strategy that reduced the number of training samples by a factor of 6 without compromising the system's performance. This was achieved by intelligently selecting the most informative samples for training, [similar to how region of interest detection can be used to focus on the most relevant parts of whole-slide images].
Leveraging Annotated Samples: The system also leveraged a small subset of fully annotated samples to improve its performance. [This is analogous to how knowledge-enhanced visual language models can leverage both weakly and strongly labeled data for computational pathology tasks].
Prototype Features: The researchers created a prototype of the system with several advanced features, including:
- Explainable Predictions: The system can provide explanations for its predictions, which can help pathologists understand and trust the AI's decisions.
- Active Learning: The system can learn from feedback provided by pathologists, allowing it to continuously improve its performance.
- Parallelization: The system is designed to run in parallel, which can speed up the analysis of large numbers of WSIs.

The system was evaluated on a large dataset of over 10,500 WSI colorectal cancer samples, including 900 testing samples. It achieved high accuracy and sensitivity, even on challenging external datasets (TCGA and PAIP). This demonstrates the robustness and scalability of the proposed approach.

Critical Analysis

The researchers have made several important contributions to the field of computational pathology. The use of weak labels and the intelligent sampling strategy are particularly noteworthy, as they address the challenge of obtaining and efficiently using large amounts of training data for whole-slide image analysis.

However, the paper does not provide a detailed discussion of the limitations of the proposed approach. For example, it would be helpful to understand how the system's performance might be affected by variations in tissue preparation, staining, or imaging protocols, which can vary across different clinical settings. [Additionally, the paper does not explore how the system's performance might be impacted by rare or atypical cancer subtypes or other confounding factors, which could be important for real-world clinical deployment.]

Furthermore, the researchers did not compare their approach to other state-of-the-art methods for whole-slide image analysis, such as those that leverage weakly supervised learning or active learning techniques. Such a comparison would help to situate the proposed method within the broader context of the field and highlight its unique contributions.

Overall, the paper presents a promising approach to colorectal cancer diagnosis using whole-slide images, but further research is needed to fully understand its limitations and potential real-world implications.

Conclusion

This paper describes the development of a scalable AI system for diagnosing colorectal cancer from whole-slide images. The key innovations include the use of weak labels, an intelligent sampling strategy, and the leveraging of a small subset of fully annotated samples to improve performance. The researchers also created a prototype with explainable predictions, active learning features, and parallelization capabilities.

The system was evaluated on a large dataset of over 10,500 WSI colorectal cancer samples, including 900 testing samples, as well as two external datasets. The results demonstrate the system's high accuracy and sensitivity, even on challenging external datasets.

While the paper presents a promising approach, further research is needed to fully understand the system's limitations and potential real-world implications. Comparing the proposed method to other state-of-the-art techniques and exploring its robustness to factors like variation in tissue preparation and rare cancer subtypes would be valuable next steps.

Overall, this work represents an important step forward in the development of scalable and reliable AI systems for computational pathology, with significant potential to improve cancer diagnosis and patient outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, Jo~ao Fraga, Ana Monteiro, Jo~ao Monteiro, Liliana Ribeiro, Sofia Gonc{c}alves, Stefan Reinhard, Inti Zlobec, Isabel M. Pinto, Jaime S. Cardoso

Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.

5/2/2024

Self-Contrastive Weakly Supervised Learning Framework for Prognostic Prediction Using Whole Slide Images

Saul Fuster, Farbod Khoraminia, Julio Silva-Rodr'iguez, Umay Kiraz, Geert J. L. H. van Leenders, Trygve Eftest{o}l, Valery Naranjo, Emiel A. M. Janssen, Tahlita C. M. Zuiverloon, Kjersti Engan

We present a pioneering investigation into the application of deep learning techniques to analyze histopathological images for addressing the substantial challenge of automated prognostic prediction. Prognostic prediction poses a unique challenge as the ground truth labels are inherently weak, and the model must anticipate future events that are not directly observable in the image. To address this challenge, we propose a novel three-part framework comprising of a convolutional network based tissue segmentation algorithm for region of interest delineation, a contrastive learning module for feature extraction, and a nested multiple instance learning classification module. Our study explores the significance of various regions of interest within the histopathological slides and exploits diverse learning scenarios. The pipeline is initially validated on artificially generated data and a simpler diagnostic task. Transitioning to prognostic prediction, tasks become more challenging. Employing bladder cancer as use case, our best models yield an AUC of 0.721 and 0.678 for recurrence and treatment outcome prediction respectively.

5/27/2024

Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification

Mukaffi Bin Moin, Fatema Tuj Johora Faria, Swarnajit Saha, Busra Kamal Rafa, Mohammad Shafiul Alam

Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification.

5/15/2024

PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

Giacomo Cignoni, Cristian Scatena, Chiara Frascarelli, Nicola Fusco, Antonio Giuseppe Naccarato, Giuseppe Nicol'o Fanelli, Alina S^irbu

Specific and effective breast cancer therapy relies on the accurate quantification of PD-L1 positivity in tumors, which appears in the form of brown stainings in high resolution whole slide images (WSIs). However, the retrieval and extensive labeling of PD-L1 stained WSIs is a time-consuming and challenging task for pathologists, resulting in low reproducibility, especially for borderline images. This study aims to develop and compare models able to classify PD-L1 positivity of breast cancer samples based on WSI analysis, relying only on WSI-level labels. The task consists of two phases: identifying regions of interest (ROI) and classifying tumors as PD-L1 positive or negative. For the latter, two model categories were developed, with different feature extraction methodologies. The first encodes images based on the colour distance from a base color. The second uses a convolutional autoencoder to obtain embeddings of WSI tiles, and aggregates them into a WSI-level embedding. For both model types, features are fed into downstream ML classifiers. Two datasets from different clinical centers were used in two different training configurations: (1) training on one dataset and testing on the other; (2) combining the datasets. We also tested the performance with or without human preprocessing to remove brown artefacts Colour distance based models achieve the best performances on testing configuration (1) with artefact removal, while autoencoder-based models are superior in the remaining cases, which are prone to greater data variability.

4/17/2024