PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Read original: arXiv:2403.16497 - Published 7/16/2024 by Jiaxuan Lu, Fang Yan, Xiaofan Zhang, Yue Gao, Shaoting Zhang

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Overview

• This paper presents a model called PathoTune that adapts a visual foundation model to specialize in pathological image analysis.

• The key idea is to "tune" a pre-trained vision model to perform well on pathological image tasks, like detecting cancer cells in histology slides, without requiring a large amount of specialized training data.

• The authors demonstrate that PathoTune can achieve strong performance on several pathological image benchmarks, potentially enabling more accurate and efficient computer-aided diagnosis tools for medical professionals.

Plain English Explanation

• PathoTune is a machine learning model that takes a general-purpose computer vision model and "tunes" it to become an expert at analyzing medical images, like those used in pathology.

• Pathology is the study of diseases, and pathologists often rely on analyzing microscopic images of tissue samples to diagnose conditions like cancer. However, collecting and annotating large datasets of these specialized medical images can be very difficult.

• The key insight behind PathoTune is that we can start with a powerful visual foundation model that has been trained on a huge variety of natural images, and then fine-tune or adapt it to become an expert at analyzing pathological images, without needing as much specialized training data.

• By leveraging this transfer learning approach, PathoTune can achieve high performance on pathology tasks, potentially helping pathologists and other medical professionals make more accurate diagnoses and discoveries.

Technical Explanation

• The authors start with a vision transformer model that has been pre-trained on a large dataset of natural images.

• They then fine-tune this model on a collection of pathological images, using a novel prompt tuning approach to adapt the model's internal representations to specialize in pathological image analysis.

• Experiments show that PathoTune outperforms both the original pre-trained model and models trained from scratch on pathological image benchmarks, demonstrating the effectiveness of this transfer learning approach.

• The authors also analyze the internal representations learned by PathoTune, showing that it captures meaningful pathological features that align with expert human annotations.

Critical Analysis

• The paper provides a compelling demonstration of how large-scale vision models can be efficiently adapted to specialized domains like computational pathology.

• However, the authors acknowledge that their approach still relies on having some labeled pathological images for fine-tuning, which may be a bottleneck in practice, especially for rare diseases.

• Additionally, the authors do not explore the interpretability or "trustworthiness" of PathoTune's predictions, which is a critical consideration for deploying such models in high-stakes medical applications.

• Further research could investigate ways to enable accurate ovarian cancer detection or other pathology tasks with even less labeled training data, or to make the model's decision-making more transparent and accountable.

Conclusion

• PathoTune demonstrates a promising approach for adapting powerful general-purpose vision models to the specialized domain of pathological image analysis, potentially enabling more accurate and efficient computer-aided diagnosis tools for medical professionals.

• By leveraging transfer learning from large-scale natural image models, PathoTune can achieve strong performance on pathology tasks without requiring massive labeled datasets, which are often difficult to obtain in the medical domain.

• While the paper highlights several compelling results, further research is needed to address challenges around data scarcity, model interpretability, and responsible deployment in high-stakes medical applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Jiaxuan Lu, Fang Yan, Xiaofan Zhang, Yue Gao, Shaoting Zhang

As natural image understanding moves towards the pretrain-finetune era, research in pathology imaging is concurrently evolving. Despite the predominant focus on pretraining pathological foundation models, how to adapt foundation models to downstream tasks is little explored. For downstream adaptation, we propose the existence of two domain gaps, i.e., the Foundation-Task Gap and the Task-Instance Gap. To mitigate these gaps, we introduce PathoTune, a framework designed to efficiently adapt pathological or even visual foundation models to pathology-specific tasks via multi-modal prompt tuning. The proposed framework leverages Task-specific Visual Prompts and Task-specific Textual Prompts to identify task-relevant features, along with Instance-specific Visual Prompts for encoding single pathological image features. Results across multiple datasets at both patch-level and WSI-level demonstrate its superior performance over single-modality prompt tuning approaches. Significantly, PathoTune facilitates the direct adaptation of natural visual foundation models to pathological tasks, drastically outperforming pathological foundation models with simple linear probing. The code is available at https://github.com/openmedlab/PathoDuet.

7/16/2024

🌿

PuzzleTuning: Explicitly Bridge Pathological and Natural Image with Puzzles

Tianyi Zhang, Shangqing Lyu, Yanli Lei, Sicheng Chen, Nan Ying, Yufang He, Yu Zhao, Yunlu Feng, Hwee Kuan Lee, Guanglei Zhang

Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, pre-training with self-supervised learning (SSL) is widely applied to learn on unlabeled images. However, the current SSL-based pathological pre-training: (1) does not explicitly explore the essential focuses of the pathological field, and (2) does not effectively bridge with and thus take advantage of the knowledge from natural images. To explicitly address them, we propose our large-scale PuzzleTuning framework, containing the following innovations. Firstly, we define three task focuses that can effectively bridge knowledge of pathological and natural domain: appearance consistency, spatial consistency, and restoration understanding. Secondly, we devise a novel multiple puzzle restoring task, which explicitly pre-trains the model regarding these focuses. Thirdly, we introduce an explicit prompt-tuning process to incrementally integrate the domain-specific knowledge. It builds a bridge to align the large domain gap between natural and pathological images. Additionally, a curriculum-learning training strategy is designed to regulate task difficulty, making the model adaptive to the puzzle restoring complexity. Experimental results show that our PuzzleTuning framework outperforms the previous state-of-the-art methods in various downstream tasks on multiple datasets. The code, demo, and pre-trained weights are available at https://github.com/sagizty/PuzzleTuning.

4/24/2024

Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear. To address this gap, we established a most comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 39 specific tasks. Our findings reveal that existing foundation models excel at certain task types but struggle to effectively handle the full breadth of clinical tasks. To improve the generalization of pathology foundation models, we propose a unified knowledge distillation framework consisting of both expert and self knowledge distillation, where the former allows the model to learn from the knowledge of multiple expert models, while the latter leverages self-distillation to enable image representation learning via local-global alignment. Based on this framework, a Generalizable Pathology Foundation Model (GPFM) is pretrained on a large-scale dataset consisting of 190 million images from around 86,000 public H&E whole slides across 34 major tissue types. Evaluated on the established benchmark, GPFM achieves an impressive average rank of 1.36, with 29 tasks ranked 1st, while the the second-best model, UNI, attains an average rank of 2.96, with only 4 tasks ranked 1st. The superior generalization of GPFM demonstrates its exceptional modeling capabilities across a wide range of clinical tasks, positioning it as a new cornerstone for feature representation in CPath.

8/6/2024

Towards Large-Scale Training of Pathology Foundation Models

kaiko. ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Kanzig, Mikhail Karasikov, Axel Lagr'e, Roman Moser, Joost van Doorn, Fei Tang

Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build foundation models (FMs) for medical images. In this work, we present our scalable training pipeline for large pathology imaging data, and a comprehensive analysis of various hyperparameter choices and training techniques for building pathology FMs. We release and make publicly available the first batch of our pathology FMs (https://github.com/kaiko-ai/towards_large_pathology_fms) trained on open-access TCGA whole slide images, a commonly used collection of pathology images. The experimental evaluation shows that our models reach state-of-the-art performance on various patch-level downstream tasks, ranging from breast cancer subtyping to colorectal nuclear segmentation. Finally, to unify the evaluation approaches used in the field and to simplify future comparisons of different FMs, we present an open-source framework (https://github.com/kaiko-ai/eva) designed for the consistent evaluation of pathology FMs across various downstream tasks.

4/24/2024