Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models

Read original: arXiv:2409.04631 - Published 9/14/2024 by Saghir Alfasly, Ghazal Alabtah, Sobhan Hemati, Krishna Rani Kalari, H. R. Tizhoosh

🖼️

Overview

Researchers have tested recently published foundation models for histopathology image retrieval.
They report macro average of F1 score for top-1, top-3, and top-5 retrievals.
They perform zero-shot retrievals without altering embeddings or training any classifier.
The test data is from The Cancer Genome Atlas (TCGA), consisting of 23 organs and 117 cancer subtypes.
They used the Yottixel platform to perform whole slide image (WSI) search using patches.
The achieved F1 scores show low performance, e.g., for top-5 retrievals: 27% ± 13% (Yottixel-DenseNet), 42% ± 14% (Yottixel-UNI), 40% ± 13% (Yottixel-Virchow), and 41% ± 13% (Yottixel-GigaPath).
The results for GigaPath WSI will be delayed due to significant computational resources required.

Plain English Explanation

The researchers have tested recently developed foundation models for their ability to retrieve relevant images from a large dataset of medical images, specifically histopathology slides. They did this without needing to fine-tune or train any new models, a process called "zero-shot" retrieval.

The test dataset consisted of diagnostic slides from the The Cancer Genome Atlas (TCGA), which includes images from 23 different organs and 117 different types of cancer. The researchers used a platform called Yottixel to search through these whole slide images using smaller image "patches."

The researchers measured the performance of the models using a metric called the F1 score, which considers both precision (how many of the retrieved images were relevant) and recall (how many of the relevant images were retrieved). They looked at the top-1, top-3, and top-5 retrievals, meaning the first, first three, and first five images retrieved by the models.

The results showed that the models generally had low performance, with top-5 F1 scores ranging from 27% to 42%. This suggests that while these foundation models can be used for histopathology image retrieval, they still have significant room for improvement. The researchers also noted that processing the GigaPath whole slide images required significant computational resources, so those results will be delayed.

Technical Explanation

The researchers evaluated the performance of recently published foundation models for the task of histopathology image retrieval. They used a zero-shot approach, meaning they did not fine-tune or train any new models, but rather used the pre-trained embeddings directly.

As test data, the researchers used diagnostic slides from The Cancer Genome Atlas (TCGA), which consists of 23 organs and 117 cancer subtypes. They used the Yottixel platform to perform whole slide image (WSI) search using image patches.

The researchers report the macro average of F1 score for top-1, majority of top-3, and majority of top-5 retrievals. The F1 score is a metric that considers both precision (how many of the retrieved images were relevant) and recall (how many of the relevant images were retrieved).

The results show relatively low performance, with top-5 F1 scores of 27% ± 13% (Yottixel-DenseNet), 42% ± 14% (Yottixel-UNI), 40% ± 13% (Yottixel-Virchow), and 41% ± 13% (Yottixel-GigaPath). The researchers note that the results for the GigaPath WSI will be delayed due to the significant computational resources required for processing.

Critical Analysis

The researchers acknowledge the low performance of the tested foundation models for histopathology image retrieval, as evidenced by the relatively low F1 scores, particularly for the top-1 and top-3 retrievals. This suggests that while these models can be used for this task, they still have significant room for improvement.

One potential limitation of the study is the use of a single test dataset, the TCGA, which may not be representative of all histopathology images. Additionally, the researchers note the computational challenges of processing the GigaPath whole slide images, which could limit the broader applicability of the tested models.

Further research could explore ways to improve the performance of these foundation models, such as fine-tuning on larger or more diverse histopathology datasets, or investigating the use of multimodal approaches that incorporate additional data sources beyond just the images.

Conclusion

The researchers have evaluated the performance of recently published foundation models for the task of histopathology image retrieval using a zero-shot approach. The results show relatively low performance, with top-5 F1 scores ranging from 27% to 42%. While these models can be used for this task, the findings suggest that significant improvements are still needed to make them more effective for real-world clinical applications in computational pathology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models

Saghir Alfasly, Ghazal Alabtah, Sobhan Hemati, Krishna Rani Kalari, H. R. Tizhoosh

We have tested recently published foundation models for histopathology for image retrieval. We report macro average of F1 score for top-1 retrieval, majority of top-3 retrievals, and majority of top-5 retrievals. We perform zero-shot retrievals, i.e., we do not alter embeddings and we do not train any classifier. As test data, we used diagnostic slides of TCGA, The Cancer Genome Atlas, consisting of 23 organs and 117 cancer subtypes. As a search platform we used Yottixel that enabled us to perform WSI search using patches. Achieved F1 scores show low performance, e.g., for top-5 retrievals, 27% +/- 13% (Yottixel-DenseNet), 42% +/- 14% (Yottixel-UNI), 40%+/-13% (Yottixel-Virchow), 41%+/-13% (Yottixel-GigaPath), and 41%+/-14% (GigaPath WSI).

9/14/2024

Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective

Shengjia Chen, Gabriele Campanella, Abdulkadir Elmas, Aryeh Stock, Jennifer Zeng, Alexandros D. Polydorides, Adam J. Schoenfeld, Kuan-lin Huang, Jane Houldsworth, Chad Vanderbilt, Thomas J. Fuchs

Recent advances in artificial intelligence (AI), in particular self-supervised learning of foundation models (FMs), are revolutionizing medical imaging and computational pathology (CPath). A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation. Due to the prevalent use of datasets created for genomic research, such as TCGA, for method development, the performance of these techniques on diagnostic slides from clinical practice has been inadequately explored. This study conducts a thorough benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks, including diagnostic assessment, biomarker classification, and outcome prediction. The results yield following key insights: (1) Embeddings derived from domain-specific (histological images) FMs outperform those from generic ImageNet-based models across aggregation methods. (2) Spatial-aware aggregators enhance the performance significantly when using ImageNet pre-trained models but not when using FMs. (3) No single model excels in all tasks and spatially-aware models do not show general superiority as it would be expected. These findings underscore the need for more adaptable and universally applicable aggregation techniques, guiding future research towards tools that better meet the evolving needs of clinical-AI in pathology. The code used in this work is available at url{https://github.com/fuchs-lab-public/CPath_SABenchmark}.

7/11/2024

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles which respectively offer distinct knowledge for versatile clinical applications. Second, the current progress in pathology FMs predominantly concentrates on the patch level, where the restricted context of patch-level pretraining fails to capture whole-slide patterns. Here we curated the largest multimodal dataset consisting of H&E diagnostic whole slide images and their associated pathology reports and RNA-Seq data, resulting in 26,169 slide-level modality pairs from 10,275 patients across 32 cancer types. To leverage these data for CPath, we propose a novel whole-slide pretraining paradigm which injects multimodal knowledge at the whole-slide context into the pathology FM, called Multimodal Self-TAught PRetraining (mSTAR). The proposed paradigm revolutionizes the workflow of pretraining for CPath, which enables the pathology FM to acquire the whole-slide context. To our knowledge, this is the first attempt to incorporate multimodal knowledge at the slide level for enhancing pathology FMs, expanding the modelling context from unimodal to multimodal knowledge and from patch-level to slide-level. To systematically evaluate the capabilities of mSTAR, extensive experiments including slide-level unimodal and multimodal applications, are conducted across 7 diverse types of tasks on 43 subtasks, resulting in the largest spectrum of downstream tasks. The average performance in various slide-level applications consistently demonstrates significant performance enhancements for mSTAR compared to SOTA FMs.

7/23/2024

PathAlign: A vision-language model for whole slide images in histopathology

Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado, Quang Duong, Dale R. Webster, Shravya Shetty, Daniel Golden, Yun Liu, David F. Steiner, Ellery Wulczyn

Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.

7/1/2024