PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

Read original: arXiv:2404.10175 - Published 4/17/2024 by Giacomo Cignoni, Cristian Scatena, Chiara Frascarelli, Nicola Fusco, Antonio Giuseppe Naccarato, Giuseppe Nicol'o Fanelli, Alina S^irbu

PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

Overview

This paper presents a deep learning approach for classifying the programmed death-ligand 1 (PD-L1) status of breast cancer tumors using weakly-labeled whole slide images (WSIs).
PD-L1 is a protein that plays a key role in the immune system's response to cancer, and its expression levels are an important biomarker for immunotherapy.
The proposed method addresses the challenge of obtaining detailed, pixel-level PD-L1 annotations, which are time-consuming and expensive, by leveraging weakly-labeled WSI data.

Plain English Explanation

The paper focuses on developing a way to determine the PD-L1 status of breast cancer tumors using whole slide images (WSIs) - high-resolution digital scans of tissue samples. PD-L1 is a protein that helps the immune system recognize and attack cancer cells. Knowing a tumor's PD-L1 level is important for deciding if a patient might benefit from immunotherapy treatments that target PD-L1.

Traditionally, determining PD-L1 levels requires detailed, manual annotations of the tissue samples at the pixel level, which is a laborious and costly process. The researchers in this study instead used "weakly-labeled" WSI data, where the PD-L1 status is only known at the whole-slide level, not the pixel level. They developed a deep learning model that can analyze these weakly-labeled WSIs and accurately classify the PD-L1 status of the tumor. This avoids the need for the expensive and time-consuming manual annotations, making it easier to assess PD-L1 expression in cancer patients.

Technical Explanation

The paper proposes a deep learning approach for classifying the PD-L1 status of breast cancer tumors using weakly-labeled whole slide images (WSIs). Obtaining detailed, pixel-level annotations of PD-L1 expression is challenging and resource-intensive, so the researchers leveraged weakly-labeled WSI data, where the PD-L1 status is known only at the whole-slide level.

The model architecture uses a convolutional neural network (CNN) backbone, followed by attention-based pooling layers to generate a slide-level PD-L1 prediction. The model is trained end-to-end using a combination of cross-entropy loss for the slide-level PD-L1 classification and a regularization term to encourage the model to focus on relevant regions of the WSI.

Experiments on a large dataset of breast cancer WSIs demonstrate that the proposed approach can achieve high PD-L1 classification accuracy, outperforming several baseline methods. The method's ability to leverage weakly-labeled data is a key advantage, as it avoids the need for costly and time-consuming manual annotations of PD-L1 expression at the pixel level.

The paper also discusses potential limitations, such as the need for larger and more diverse datasets to further improve the model's generalization, and the potential challenges of applying the approach to other cancer types or histological features beyond PD-L1.

Critical Analysis

The research presented in this paper addresses an important challenge in cancer diagnosis and treatment - the accurate assessment of PD-L1 expression in tumor samples. By leveraging weakly-labeled whole slide images, the proposed method offers a more scalable and cost-effective solution compared to traditional approaches that require detailed manual annotations.

One potential limitation is the reliance on a single dataset of breast cancer WSIs. To further validate the method's robustness and generalizability, it would be valuable to evaluate its performance on additional datasets, potentially covering other cancer types or sources of clinical data. Additionally, the paper does not provide insights into the specific image regions or visual features the model uses to make its PD-L1 predictions, which could be an area for future investigation.

Despite these minor caveats, the overall approach presented in the paper represents a promising step forward in the development of AI-driven tools for supporting cancer diagnosis and treatment decisions. As the field of computational pathology continues to advance, methods that can leverage weakly-supervised data sources, like the one described in this paper, will likely play an increasingly important role in enabling more efficient and scalable analyses of histological samples.

Conclusion

This paper introduces a deep learning-based approach for classifying the PD-L1 status of breast cancer tumors using weakly-labeled whole slide images. By avoiding the need for expensive and time-consuming manual annotations of PD-L1 expression at the pixel level, the proposed method offers a more scalable and accessible solution for assessing this important biomarker.

The results demonstrate the effectiveness of the model in accurately predicting PD-L1 status, which could have significant implications for improving the diagnosis and treatment of breast cancer patients. As the field of computational pathology continues to evolve, techniques that can leverage weakly-supervised data sources, like the one described in this paper, will likely become increasingly important for driving advancements in cancer diagnosis and care.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

Giacomo Cignoni, Cristian Scatena, Chiara Frascarelli, Nicola Fusco, Antonio Giuseppe Naccarato, Giuseppe Nicol'o Fanelli, Alina S^irbu

Specific and effective breast cancer therapy relies on the accurate quantification of PD-L1 positivity in tumors, which appears in the form of brown stainings in high resolution whole slide images (WSIs). However, the retrieval and extensive labeling of PD-L1 stained WSIs is a time-consuming and challenging task for pathologists, resulting in low reproducibility, especially for borderline images. This study aims to develop and compare models able to classify PD-L1 positivity of breast cancer samples based on WSI analysis, relying only on WSI-level labels. The task consists of two phases: identifying regions of interest (ROI) and classifying tumors as PD-L1 positive or negative. For the latter, two model categories were developed, with different feature extraction methodologies. The first encodes images based on the colour distance from a base color. The second uses a convolutional autoencoder to obtain embeddings of WSI tiles, and aggregates them into a WSI-level embedding. For both model types, features are fed into downstream ML classifiers. Two datasets from different clinical centers were used in two different training configurations: (1) training on one dataset and testing on the other; (2) combining the datasets. We also tested the performance with or without human preprocessing to remove brown artefacts Colour distance based models achieve the best performances on testing configuration (1) with artefact removal, while autoencoder-based models are superior in the remaining cases, which are prone to greater data variability.

4/17/2024

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Martim Afonso, Praphulla M. S. Bhawsar, Monjoy Saha, Jonas S. Almeida, Arlindo L. Oliveira

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

4/12/2024

🖼️

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, Jo~ao Fraga, Ana Monteiro, Jo~ao Monteiro, Liliana Ribeiro, Sofia Gonc{c}alves, Stefan Reinhard, Inti Zlobec, Isabel M. Pinto, Jaime S. Cardoso

Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.

5/2/2024

Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation

Akash Modi, Sumit Kumar Jha, Purnendu Mishra, Rajiv Kumar, Kiran Aatre, Gursewak Singh, Shubham Mathur

Digital pathology and microscopy image analysis are widely employed in the segmentation of digitally scanned IHC slides, primarily to identify cancer and pinpoint regions of interest (ROI) indicative of tumor presence. However, current ROI segmentation models are either stain-specific or suffer from the issues of stain and scanner variance due to different staining protocols or modalities across multiple labs. Also, tissues like Ductal Carcinoma in Situ (DCIS), acini, etc. are often classified as Tumors due to their structural similarities and color compositions. In this paper, we proposed a novel convolutional neural network (CNN) based Multi-class Tissue Segmentation model for histopathology whole-slide Breast slides which classify tumors and segments other tissue regions such as Ducts, acini, DCIS, Squamous epithelium, Blood Vessels, Necrosis, etc. as a separate class. Our unique pixel-aligned non-linear merge across spatial resolutions empowers models with both local and global fields of view for accurate detection of various classes. Our proposed model is also able to separate bad regions such as folds, artifacts, blurry regions, bubbles, etc. from tissue regions using multi-level context from different resolutions of WSI. Multi-phase iterative training with context-aware augmentation and increasing noise was used to efficiently train a multi-stain generic model with partial and noisy annotations from 513 slides. Our training pipeline used 12 million patches generated using context-aware augmentations which made our model stain and scanner invariant across data sources. To extrapolate stain and scanner invariance, our model was evaluated on 23000 patches which were for a completely new stain (Hematoxylin and Eosin) from a completely new scanner (Motic) from a different lab. The mean IOU was 0.72 which is on par with model performance on other data sources and scanners.

6/11/2024