Virchow 2: Scaling Self-Supervised Mixed Magnification Models in Pathology

Read original: arXiv:2408.00738 - Published 8/16/2024 by Eric Zimmermann, Eugene Vorontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi and 4 others

Virchow 2: Scaling Self-Supervised Mixed Magnification Models in Pathology

Overview

The research paper discusses Virchow 2, a self-supervised deep learning model for scaling mixed magnification models in pathology.
The model aims to improve upon previous work in self-supervised learning for pathology images by incorporating multiple magnification levels.
Key contributions include scaling the model to larger and more diverse datasets, improving performance on downstream tasks, and insights into the importance of multi-scale representations.

Plain English Explanation

The paper introduces Virchow 2, an enhanced version of a previous deep learning model called Virchow. Virchow 2 is designed to work with pathology images, which are medical images of tissue samples examined under a microscope.

The importance of multi-scale representations is a key insight. Pathology images contain details at different levels of magnification, and Virchow 2 is able to learn representations that capture information at multiple scales. This allows the model to better understand the full context of the images.

Virchow 2 is a "self-supervised" model, meaning it can learn useful features from the data without being explicitly told what to look for. This is an advantage over traditional machine learning models that require large labeled datasets. By scaling Virchow 2 to larger and more diverse datasets, the researchers were able to improve the model's performance on downstream tasks like disease diagnosis.

The paper also discusses related work in self-supervised learning for pathology, highlighting how Virchow 2 builds upon and advances the state of the art in this area.

Technical Explanation

The key technical contributions of the Virchow 2 paper include:

Scaling to Larger Datasets: The researchers scaled the original Virchow model to work with larger and more diverse pathology image datasets, demonstrating its ability to learn robust and generalizable representations.
Multi-Magnification Learning: Virchow 2 incorporates multi-scale training, where the model learns to extract features from pathology images at different levels of magnification. This allows the model to capture important context at both the cellular and tissue levels.
Improved Downstream Performance: By leveraging the multi-scale representations learned by Virchow 2, the researchers showed improved performance on downstream tasks like disease classification compared to previous self-supervised approaches.
Insights into Multi-Scale Representations: The paper provides insights into the importance of multi-scale representations in pathology, highlighting how they can lead to better understanding of disease patterns and more accurate diagnoses.

The architecture of Virchow 2 builds upon recent advancements in self-supervised vision transformers, allowing the model to efficiently process pathology images of varying sizes and magnifications.

Critical Analysis

The paper does a thorough job of evaluating the Virchow 2 model and highlighting its strengths, but also acknowledges some potential limitations and areas for future research:

The researchers note that the model's performance is still dependent on the quality and diversity of the training data. Obtaining high-quality, labeled pathology datasets at scale remains a challenge.
While Virchow 2 demonstrates improved performance on downstream tasks, the paper does not provide a detailed analysis of the model's internal representations and how they contribute to better disease understanding.
Further research could explore ways to integrate Virchow 2 with other modalities, such as clinical metadata or genomic data, to create more comprehensive and clinically relevant pathology models.

Overall, the Virchow 2 paper presents a significant advancement in self-supervised learning for pathology, but there is still room for further refinement and integration with other relevant data sources to maximize the model's clinical utility.

Conclusion

The Virchow 2 paper demonstrates the potential of self-supervised deep learning to scale and improve pathology image analysis. By incorporating multi-scale representations, the model is able to better capture the nuanced patterns and context present in pathology images, leading to enhanced performance on downstream tasks.

This work has important implications for the development of more robust and clinically useful pathology AI systems, which could ultimately lead to more accurate disease diagnoses and better patient outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Virchow 2: Scaling Self-Supervised Mixed Magnification Models in Pathology

Eric Zimmermann, Eugene Vorontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolo Fusi, Siqi Liu, Kristen Severson

Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and model size, surpassing previous studies in both dimensions. We introduce two new models: Virchow2, a 632 million parameter vision transformer, and Virchow2G, a 1.9 billion parameter vision transformer, each trained with 3.1 million histopathology whole slide images, with diverse tissues, originating institutions, and stains. We achieve state of the art performance on 12 tile-level tasks, as compared to the top performing competing models. Our results suggest that data diversity and domain-specific methods can outperform models that only scale in the number of parameters, but, on average, performance benefits from the combination of domain-specific methods, data scale, and model scale.

8/16/2024

Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization

Sebastian Doerrich, Francesco Di Salvo, Christian Ledig

Despite notable advancements, the integration of deep learning (DL) techniques into impactful clinical applications, particularly in the realm of digital histopathology, has been hindered by challenges associated with achieving robust generalization across diverse imaging domains and characteristics. Traditional mitigation strategies in this field such as data augmentation and stain color normalization have proven insufficient in addressing this limitation, necessitating the exploration of alternative methodologies. To this end, we propose a novel generative method for domain generalization in histopathology images. Our method employs a generative, self-supervised Vision Transformer to dynamically extract characteristics of image patches and seamlessly infuse them into the original images, thereby creating novel, synthetic images with diverse attributes. By enriching the dataset with such synthesized images, we aim to enhance its holistic nature, facilitating improved generalization of DL models to unseen domains. Extensive experiments conducted on two distinct histopathology datasets demonstrate the effectiveness of our proposed approach, outperforming the state of the art substantially, on the Camelyon17-wilds challenge dataset (+2%) and on a second epithelium-stroma dataset (+26%). Furthermore, we emphasize our method's ability to readily scale with increasingly available unlabeled data samples and more complex, higher parametric architectures. Source code is available at https://github.com/sdoerrich97/vits-are-generative-models .

7/4/2024

Towards Large-Scale Training of Pathology Foundation Models

kaiko. ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Kanzig, Mikhail Karasikov, Axel Lagr'e, Roman Moser, Joost van Doorn, Fei Tang

Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build foundation models (FMs) for medical images. In this work, we present our scalable training pipeline for large pathology imaging data, and a comprehensive analysis of various hyperparameter choices and training techniques for building pathology FMs. We release and make publicly available the first batch of our pathology FMs (https://github.com/kaiko-ai/towards_large_pathology_fms) trained on open-access TCGA whole slide images, a commonly used collection of pathology images. The experimental evaluation shows that our models reach state-of-the-art performance on various patch-level downstream tasks, ranging from breast cancer subtyping to colorectal nuclear segmentation. Finally, to unify the evaluation approaches used in the field and to simplify future comparisons of different FMs, we present an open-source framework (https://github.com/kaiko-ai/eva) designed for the consistent evaluation of pathology FMs across various downstream tasks.

4/24/2024

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles which respectively offer distinct knowledge for versatile clinical applications. Second, the current progress in pathology FMs predominantly concentrates on the patch level, where the restricted context of patch-level pretraining fails to capture whole-slide patterns. Here we curated the largest multimodal dataset consisting of H&E diagnostic whole slide images and their associated pathology reports and RNA-Seq data, resulting in 26,169 slide-level modality pairs from 10,275 patients across 32 cancer types. To leverage these data for CPath, we propose a novel whole-slide pretraining paradigm which injects multimodal knowledge at the whole-slide context into the pathology FM, called Multimodal Self-TAught PRetraining (mSTAR). The proposed paradigm revolutionizes the workflow of pretraining for CPath, which enables the pathology FM to acquire the whole-slide context. To our knowledge, this is the first attempt to incorporate multimodal knowledge at the slide level for enhancing pathology FMs, expanding the modelling context from unimodal to multimodal knowledge and from patch-level to slide-level. To systematically evaluate the capabilities of mSTAR, extensive experiments including slide-level unimodal and multimodal applications, are conducted across 7 diverse types of tasks on 43 subtasks, resulting in the largest spectrum of downstream tasks. The average performance in various slide-level applications consistently demonstrates significant performance enhancements for mSTAR compared to SOTA FMs.

7/23/2024