A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

Read original: arXiv:2407.06508 - Published 7/12/2024 by Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jennifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan and 3 others

A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

Overview

This paper presents a benchmark for evaluating the performance of publicly available self-supervised pathology foundation models on clinically relevant tasks.
The models are evaluated on a diverse set of datasets covering various pathologies, including breast cancer, lung cancer, and COVID-19 detection.
The benchmark provides insights into the capabilities and limitations of these foundation models, which can inform future model development and clinical deployment.

Plain English Explanation

In the field of computational pathology, researchers have been developing powerful foundation models that can be trained on large datasets of medical images and then used for a variety of tasks, such as disease detection or cancer diagnosis. These foundation models are trained using a technique called self-supervision, which allows them to learn useful features from the data without needing extensive manual labeling.

This paper presents a comprehensive benchmark that evaluates the performance of several publicly available self-supervised pathology foundation models on a range of clinically relevant tasks. The researchers tested the models on datasets covering different types of diseases, such as breast cancer, lung cancer, and COVID-19 detection. By doing this, they were able to understand the strengths and limitations of these foundation models and how well they might perform in real-world clinical settings.

The results of the benchmark provide valuable insights for researchers and clinicians who are interested in using these powerful AI tools to improve patient care. The findings can help guide the development of future foundation models and inform decisions about how to best deploy them in clinical practice.

Technical Explanation

The researchers in this study created a benchmark to assess the performance of publicly available self-supervised pathology foundation models on a diverse set of clinically relevant tasks. They evaluated several models, including SSMIL, CPC-Path, and HistoPath, on datasets covering breast cancer, lung cancer, and COVID-19 detection.

The benchmark included tasks such as classification, segmentation, and weakly-supervised localization. The models were evaluated using standard metrics like accuracy, F1 score, and area under the receiver operating characteristic (ROC) curve. The results showed that the foundation models generally performed well on the benchmark tasks, with some variability in performance depending on the specific dataset and task.

The researchers also analyzed the feature representations learned by the foundation models and found that they captured useful information related to clinical pathology. This suggests that these models can be effectively adapted and fine-tuned for a wide range of clinical applications.

Critical Analysis

The benchmark presented in this paper provides a valuable starting point for evaluating the capabilities of self-supervised pathology foundation models. However, the authors acknowledge that the benchmark is limited to a relatively small set of datasets and tasks, and that further testing on a broader range of clinical scenarios would be necessary to fully understand the models' performance.

Additionally, the paper does not delve into the potential biases or limitations of the foundation models themselves, such as their ability to generalize to diverse patient populations or handle rare or atypical cases. These are important considerations that should be addressed in future research.

It would also be helpful to see more detailed comparisons between the different foundation models, as well as comparisons to human expert performance on the same tasks. This would provide a more comprehensive understanding of the strengths and weaknesses of the AI-based approaches.

Conclusion

This paper presents a valuable benchmark for evaluating the performance of publicly available self-supervised pathology foundation models on clinically relevant tasks. The results suggest that these models have the potential to be useful tools for a range of clinical applications, but also highlight the need for further research and testing to fully understand their capabilities and limitations.

The insights gained from this benchmark can inform the development of future foundation models and guide the integration of these AI-based tools into clinical practice. As the field of computational pathology continues to evolve, benchmarks like this will play an important role in ensuring the responsible and effective deployment of these technologies to improve patient outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jennifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan, Jane Houldsworth, Adam J. Schoenfeld, Chad Vanderbilt

The use of self-supervised learning (SSL) to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With the increase in availability of public foundation models of different sizes, trained using different algorithms on different datasets, it becomes important to establish a benchmark to compare the performance of such models on a variety of clinically relevant tasks spanning multiple organs and diseases. In this work, we present a collection of pathology datasets comprising clinical slides associated with clinically relevant endpoints including cancer diagnoses and a variety of biomarkers generated during standard hospital operation from two medical centers. We leverage these datasets to systematically assess the performance of public pathology foundation models and provide insights into best practices for training new foundation models and selecting appropriate pretrained models.

7/12/2024

✨

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Peter Neidlinger, Omar S. M. El Nahhas, Hannah Sophie Muti, Tim Lenz, Michael Hoffmeister, Hermann Brenner, Marko van Treeck, Rupert Langer, Bastian Dislich, Hans Michael Behrens, Christoph Rocken, Sebastian Foersch, Daniel Truhn, Antonio Marra, Oliver Lester Saldanha, Jakob Nikolas Kather

Advancements in artificial intelligence have driven the development of numerous pathology foundation models capable of extracting clinically relevant information. However, there is currently limited literature independently evaluating these foundation models on truly external cohorts and clinically-relevant tasks to uncover adjustments for future improvements. In this study, we benchmarked ten histopathology foundation models on 13 patient cohorts with 6,791 patients and 9,493 slides from lung, colorectal, gastric, and breast cancers. The models were evaluated on weakly-supervised tasks related to biomarkers, morphological properties, and prognostic outcomes. We show that a vision-language foundation model, CONCH, yielded the highest performance in 42% of tasks when compared to vision-only foundation models. The experiments reveal that foundation models trained on distinct cohorts learn complementary features to predict the same label, and can be fused to outperform the current state of the art. Creating an ensemble of complementary foundation models outperformed CONCH in 66% of tasks. Moreover, our findings suggest that data diversity outweighs data volume for foundation models. Our work highlights actionable adjustments to improve pathology foundation models.

8/29/2024

Towards Large-Scale Training of Pathology Foundation Models

kaiko. ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Kanzig, Mikhail Karasikov, Axel Lagr'e, Roman Moser, Joost van Doorn, Fei Tang

Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build foundation models (FMs) for medical images. In this work, we present our scalable training pipeline for large pathology imaging data, and a comprehensive analysis of various hyperparameter choices and training techniques for building pathology FMs. We release and make publicly available the first batch of our pathology FMs (https://github.com/kaiko-ai/towards_large_pathology_fms) trained on open-access TCGA whole slide images, a commonly used collection of pathology images. The experimental evaluation shows that our models reach state-of-the-art performance on various patch-level downstream tasks, ranging from breast cancer subtyping to colorectal nuclear segmentation. Finally, to unify the evaluation approaches used in the field and to simplify future comparisons of different FMs, we present an open-source framework (https://github.com/kaiko-ai/eva) designed for the consistent evaluation of pathology FMs across various downstream tasks.

4/24/2024

Adapting Self-Supervised Learning for Computational Pathology

Eric Zimmermann, Neil Tenenholtz, James Hall, George Shaikovski, Michal Zelechowski, Adam Casson, Fausto Milletari, Julian Viret, Eugene Vorontsov, Siqi Liu, Kristen Severson

Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.

5/6/2024