Transcriptomics-guided Slide Representation Learning in Computational Pathology

Read original: arXiv:2405.11618 - Published 5/21/2024 by Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya, Richard J. Chen, Drew F. K. Williamson, Thomas Peeters, Andrew H. Song, Faisal Mahmood

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Overview

• This research paper explores a novel approach to slide representation learning in computational pathology, which is the automated analysis of medical images for disease diagnosis and research. • The key idea is to leverage transcriptomics data, which is information about gene expression patterns, to guide the learning of useful representations from pathology slide images. • This transcriptomics-guided approach aims to capture biologically relevant features in the slide representations, potentially improving the performance of downstream tasks like disease classification.

Plain English Explanation

• In computational pathology, researchers use machine learning to analyze medical images, like tissue samples viewed under a microscope, to help diagnose diseases and further scientific understanding. • To do this, the machine learning models need to learn useful "representations" of the slide images - ways of encoding the important visual features in a compact form that the model can work with. • This paper proposes using information about gene expression, called transcriptomics data, to guide the learning of these slide representations. • The hypothesis is that by incorporating this biological data, the representations will better capture the clinically relevant details in the images, leading to improved performance on tasks like disease classification. • This approach aims to make the machine learning models more grounded in the underlying biology, rather than just finding statistical patterns in the images alone.

Technical Explanation

• The researchers developed a model architecture that jointly learns slide representations and aligns them with transcriptomics data. • This involves an encoder network that maps slide images to a latent representation, and a decoder network that tries to reconstruct the transcriptomics data from the latent representation. • The model is trained in an unsupervised manner, meaning it doesn't require any labeled slide data, just the unlabeled images and corresponding transcriptomics profiles. • Experiments on multiple datasets show that the transcriptomics-guided representations outperform representations learned without this biological guidance, especially for tasks like disease classification and region-of-interest detection.

Critical Analysis

• The paper acknowledges that obtaining high-quality transcriptomics data may be challenging in practice, as it requires complex wet-lab procedures. • The authors suggest exploring alternative sources of biological information, such as immunohistochemistry or spatial transcriptomics, to guide the representation learning. • Additionally, the current experiments are limited to a few small-scale datasets, so further evaluation on larger, more diverse datasets would be valuable to assess the generalizability of the approach.

Conclusion

• This research presents a promising direction for incorporating rich biological information into the learning of slide representations for computational pathology. • By leveraging transcriptomics data, the model can learn representations that are more closely aligned with the underlying biology, potentially leading to improved performance on clinically relevant tasks. • While some practical challenges remain, this work highlights the value of multi-modal integration in computational pathology and suggests avenues for further research in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya, Richard J. Chen, Drew F. K. Williamson, Thomas Peeters, Andrew H. Song, Faisal Mahmood

Self-supervised learning (SSL) has been successful in building patch embeddings of small histology images (e.g., 224x224 pixels), but scaling these models to learn slide embeddings from the entirety of giga-pixel whole-slide images (WSIs) remains challenging. Here, we leverage complementary information from gene expression profiles to guide slide representation learning using multimodal pre-training. Expression profiles constitute highly detailed molecular descriptions of a tissue that we hypothesize offer a strong task-agnostic training signal for learning slide embeddings. Our slide and expression (S+E) pre-training strategy, called Tangle, employs modality-specific encoders, the outputs of which are aligned via contrastive learning. Tangle was pre-trained on samples from three different organs: liver (n=6,597 S+E pairs), breast (n=1,020), and lung (n=1,012) from two different species (Homo sapiens and Rattus norvegicus). Across three independent test datasets consisting of 1,265 breast WSIs, 1,946 lung WSIs, and 4,584 liver WSIs, Tangle shows significantly better few-shot performance compared to supervised and SSL baselines. When assessed using prototype-based classification and slide retrieval, Tangle also shows a substantial performance improvement over all baselines. Code available at https://github.com/mahmoodlab/TANGLE.

5/21/2024

Multistain Pretraining for Slide Representation Learning in Pathology

Guillaume Jaume, Anurag Vaidya, Andrew Zhang, Andrew H. Song, Richard J. Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long Phi Le, Faisal Mahmood

Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE.

8/7/2024

A self-supervised framework for learning whole slide representations

Xinhai Hou, Cheng Jiang, Akhil Kondepudi, Yiwei Lyu, Asadur Chowdury, Honglak Lee, Todd C. Hollon

Whole slide imaging is fundamental to biomedical microscopy and computational pathology. Previously, learning representations for gigapixel-sized whole slide images (WSIs) has relied on multiple instance learning with weak labels, which do not annotate the diverse morphologic features and spatial heterogeneity of WSIs. A high-quality self-supervised learning method for WSIs would provide transferable visual representations for downstream computational pathology tasks, without the need for dense annotations. We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of WSIs. Treating WSI patches as tokens, SPT combines data transformation strategies from language and vision modeling into a general and unified framework to generate views of WSIs for self-supervised pretraining. SPT leverages the inherent regional heterogeneity, histologic feature variability, and information redundancy within WSIs to learn high-quality whole slide representations. We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets. SPT significantly outperforms baselines for histopathologic diagnosis, cancer subtyping, and genetic mutation prediction. Finally, we demonstrate that SPT consistently improves whole slide representations when using off-the-shelf, in-domain, and foundational patch encoders for whole slide multiple instance learning.

5/27/2024

Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a word, integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

7/12/2024