Beyond Multiple Instance Learning: Full Resolution All-In-Memory End-To-End Pathology Slide Modeling

Read original: arXiv:2403.04865 - Published 5/24/2024 by Gabriele Campanella, Eugene Fluder, Jennifer Zeng, Chad Vanderbilt, Thomas J. Fuchs

🤖

Overview

Artificial Intelligence (AI) has great potential to improve healthcare by analyzing large clinical datasets
Computational Pathology, which uses microscopy image data, is a leading field for this development
Analyzing gigapixel pathology slides poses unique challenges due to their enormous size
Researchers propose a novel approach to jointly train both a tile encoder and a slide-aggregator fully in memory and end-to-end, overcoming previous limitations

Plain English Explanation

AI systems have the ability to analyze massive amounts of digital medical data, which could lead to significant improvements in healthcare. One area where this is particularly promising is Computational Pathology, which uses AI to examine microscope images of tissue samples. These pathology images can be incredibly detailed, often containing billions of pixels, which creates unique challenges for training AI models.

Typically, these gigapixel slides are divided into smaller "tiles" for analysis. This fragmentation can disrupt the machine learning process, as the models have to be trained separately on the tile-level encoders and the slide-level aggregators. To address this, the researchers propose a new approach that trains both the tile encoder and slide aggregator simultaneously, end-to-end, without the need to separate the process.

While this approach is more computationally expensive, the researchers' validation shows that it has the potential to enable large-scale pre-training and fine-tuning of AI models for pathology, which could lead to significant advancements in areas like disease diagnosis and image analysis.

Technical Explanation

The paper proposes a novel approach to jointly train both a tile encoder and a slide-aggregator fully in memory and end-to-end at high-resolution, overcoming the challenges posed by the fragmentation of gigapixel pathology slides.

Typically, the training of tile-level encoders and slide-level aggregators is separated, leading to a discontinuity in the machine learning process. This results in the need to adopt weakly supervised learning strategies. The researchers' approach bridges this gap by training the models end-to-end, without the need for this separation.

While more computationally expensive, the detailed quantitative validation conducted by the researchers shows promise for large-scale pre-training and fine-tuning of pathology foundation models. This could have significant impacts on areas like interpretable diagnostic systems, region of interest detection, and multimodal pre-training or slide representation learning.

Critical Analysis

The paper presents a promising approach to overcoming the challenges of training AI models on gigapixel pathology slides. However, the researchers acknowledge that their method is more computationally expensive than traditional approaches.

While the validation results are encouraging, the paper does not provide a detailed analysis of the trade-offs between the increased computational cost and the potential benefits of the end-to-end training approach. It would be helpful to understand the specific performance improvements and the circumstances in which the advantages of this method outweigh the increased resource requirements.

Additionally, the paper focuses on the technical aspects of the model architecture and training process, but does not delve into the potential implications for clinical practice or the broader impact on the field of Computational Pathology. Further research and discussion on these areas could help establish a more comprehensive understanding of the significance of this work.

Conclusion

This paper presents a novel approach to training AI models on gigapixel pathology slides, which addresses the challenges posed by the fragmentation of these large, detailed images. By jointly training the tile encoder and slide aggregator end-to-end, the researchers have developed a method that has the potential to enable large-scale pre-training and fine-tuning of pathology foundation models.

While the increased computational cost is a limitation, the validation results suggest that this approach could lead to significant advancements in areas like disease diagnosis, image analysis, and multimodal learning in Computational Pathology. As the field continues to evolve, this research represents an important step towards unlocking the full potential of AI in improving healthcare outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

Beyond Multiple Instance Learning: Full Resolution All-In-Memory End-To-End Pathology Slide Modeling

Gabriele Campanella, Eugene Fluder, Jennifer Zeng, Chad Vanderbilt, Thomas J. Fuchs

Artificial Intelligence (AI) has great potential to improve health outcomes by training systems on vast digitized clinical datasets. Computational Pathology, with its massive amounts of microscopy image data and impact on diagnostics and biomarkers, is at the forefront of this development. Gigapixel pathology slides pose a unique challenge due to their enormous size and are usually divided into tens of thousands of smaller tiles for analysis. This results in a discontinuity in the machine learning process by separating the training of tile-level encoders from slide-level aggregators and the need to adopt weakly supervised learning strategies. Training models from entire pathology slides end-to-end has been largely unexplored due to its computational challenges. To overcome this problem, we propose a novel approach to jointly train both a tile encoder and a slide-aggregator fully in memory and end-to-end at high-resolution, bridging the gap between input and slide-level supervision. While more computationally expensive, detailed quantitative validation shows promise for large-scale pre-training and fine-tuning of pathology foundation models.

5/24/2024

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles which respectively offer distinct knowledge for versatile clinical applications. Second, the current progress in pathology FMs predominantly concentrates on the patch level, where the restricted context of patch-level pretraining fails to capture whole-slide patterns. Here we curated the largest multimodal dataset consisting of H&E diagnostic whole slide images and their associated pathology reports and RNA-Seq data, resulting in 26,169 slide-level modality pairs from 10,275 patients across 32 cancer types. To leverage these data for CPath, we propose a novel whole-slide pretraining paradigm which injects multimodal knowledge at the whole-slide context into the pathology FM, called Multimodal Self-TAught PRetraining (mSTAR). The proposed paradigm revolutionizes the workflow of pretraining for CPath, which enables the pathology FM to acquire the whole-slide context. To our knowledge, this is the first attempt to incorporate multimodal knowledge at the slide level for enhancing pathology FMs, expanding the modelling context from unimodal to multimodal knowledge and from patch-level to slide-level. To systematically evaluate the capabilities of mSTAR, extensive experiments including slide-level unimodal and multimodal applications, are conducted across 7 diverse types of tasks on 43 subtasks, resulting in the largest spectrum of downstream tasks. The average performance in various slide-level applications consistently demonstrates significant performance enhancements for mSTAR compared to SOTA FMs.

7/23/2024

🔮

Pathology Foundation Models

Mieko Ochi, Daisuke Komura, Shumpei Ishikawa

Pathology has played a crucial role in the diagnosis and evaluation of patient tissue samples obtained from surgeries and biopsies for many years. The advent of Whole Slide Scanners and the development of deep learning technologies have significantly advanced the field, leading to extensive research and development in pathology AI (Artificial Intelligence). These advancements have contributed to reducing the workload of pathologists and supporting decision-making in treatment plans. Recently, large-scale AI models known as Foundation Models (FMs), which are more accurate and applicable to a wide range of tasks compared to traditional AI, have emerged, and expanded their application scope in the healthcare field. Numerous FMs have been developed in pathology, and there are reported cases of their application in various tasks, such as disease diagnosis, rare cancer diagnosis, patient survival prognosis prediction, biomarker expression prediction, and the scoring of immunohistochemical expression intensity. However, several challenges remain for the clinical application of FMs, which healthcare professionals, as users, must be aware of. Research is ongoing to address these challenges. In the future, it is expected that the development of Generalist Medical AI, which integrates pathology FMs with FMs from other medical domains, will progress, leading to the effective utilization of AI in real clinical settings to promote precision and personalized medicine.

8/7/2024

🛸

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Pingyi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui, Lin Yang

Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (PathText). In specific, we collected nearly 10000 high-quality WSI-text pairs for visual-language models by recognizing and cleaning pathology reports which narrate diagnostic slides in TCGA. On the model end, we propose the multiple instance generative model (MI-Gen) which can produce pathology reports for gigapixel WSIs. We benchmark our model on the largest subset of TCGA-PathoText. Experimental results show our model can generate pathology reports which contain multiple clinical clues and achieve competitive performance on certain slide-level tasks. We observe that simple semantic extraction from the pathology reports can achieve the best performance (0.838 of F1 score) on BRCA subtyping surpassing previous state-of-the-art approaches. Our collected dataset and related code are available.

6/28/2024