WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Read original: arXiv:2311.16480 - Published 6/28/2024 by Pingyi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui, Lin Yang

🛸

Overview

Whole slide images (WSIs) are the foundation of digital pathology for diagnosing and treating carcinomas.
Pathology report writing is time-consuming and error-prone for inexperienced pathologists.
This research aims to generate pathology reports automatically from WSIs to reduce workload and improve clinical automation.
The study curated the largest WSI-text dataset, called PathText, and proposed a multiple instance generative model (MI-Gen) to produce pathology reports from gigapixel WSIs.

Plain English Explanation

Whole slide images (WSIs) are digital scans of tissue samples that pathologists use to diagnose and treat cancer. Writing detailed pathology reports based on these WSIs is a laborious and error-prone process, especially for less experienced pathologists. To address this challenge, the researchers in this study investigated how to automatically generate pathology reports from WSIs.

First, they built the largest dataset of high-quality WSI-text pairs, called PathText, by collecting and cleaning pathology reports that describe diagnostic slides from the Cancer Genome Atlas (TCGA) project. This dataset provides the necessary training data for developing AI models to generate pathology reports.

Next, the researchers proposed a new multiple instance generative model (MI-Gen) that can produce pathology reports for gigapixel WSIs. This model learns to associate visual patterns in the WSIs with the corresponding text in the pathology reports.

The researchers benchmarked their MI-Gen model on a large subset of the TCGA-PathText dataset. The results show that their model can generate pathology reports containing multiple clinically relevant findings. Additionally, they found that simple extraction of information from the generated reports can achieve state-of-the-art performance on certain cancer subtyping tasks, surpassing previous approaches.

Technical Explanation

The researchers curated the largest WSI-text dataset, called PathText, by recognizing and cleaning pathology reports that describe diagnostic slides from the TCGA project. This dataset contains nearly 10,000 high-quality WSI-text pairs, providing the necessary training data for developing AI models to generate pathology reports.

To address the challenge of producing pathology reports from WSIs, the researchers proposed a multiple instance generative model (MI-Gen). This model learns to associate visual patterns in the WSIs with the corresponding text in the pathology reports. The MI-Gen architecture leverages both local and global information from the WSIs to generate coherent and clinically relevant pathology reports.

The researchers benchmarked their MI-Gen model on a large subset of the TCGA-PathText dataset. Experimental results show that their model can generate pathology reports containing multiple clinical clues and achieve competitive performance on certain slide-level tasks, such as breast cancer subtyping. Interestingly, they found that simple semantic extraction from the generated pathology reports can outperform previous state-of-the-art approaches on BRCA subtyping, achieving an F1 score of 0.838.

Critical Analysis

The researchers acknowledge several limitations of their study. First, the dataset they curated, PathText, may not capture the full diversity of pathology reports, as it is derived from a single source (TCGA). Expanding the dataset to include reports from multiple institutions could improve the generalizability of their model.

Additionally, the researchers do not provide a detailed analysis of the types of clinical clues and insights contained in the generated pathology reports. A more in-depth evaluation of the report quality and clinical relevance would help assess the practical utility of their approach.

Further research is also needed to understand the interpretability and explainability of the MI-Gen model's decision-making process. Techniques like interpretable machine learning or self-supervised learning could shed light on how the model associates visual patterns with textual descriptions, which would be valuable for building trust in the system.

Conclusion

This research presents a novel approach to automatically generate pathology reports from whole slide images, a critical task in digital pathology. By curating the largest WSI-text dataset, PathText, and developing the multiple instance generative model (MI-Gen), the researchers have made significant progress towards reducing the workload and improving the accuracy of pathology report writing.

The ability to generate clinically relevant pathology reports from WSIs has the potential to revolutionize the field of digital pathology, enabling faster and more consistent diagnoses, and ultimately leading to improved patient outcomes. While further research is needed to refine and validate the approach, this work represents an important step towards the goal of automating pathology report generation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛸

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Pingyi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui, Lin Yang

Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (PathText). In specific, we collected nearly 10000 high-quality WSI-text pairs for visual-language models by recognizing and cleaning pathology reports which narrate diagnostic slides in TCGA. On the model end, we propose the multiple instance generative model (MI-Gen) which can produce pathology reports for gigapixel WSIs. We benchmark our model on the largest subset of TCGA-PathoText. Experimental results show our model can generate pathology reports which contain multiple clinical clues and achieve competitive performance on certain slide-level tasks. We observe that simple semantic extraction from the pathology reports can achieve the best performance (0.838 of F1 score) on BRCA subtyping surpassing previous state-of-the-art approaches. Our collected dataset and related code are available.

6/28/2024

PathAlign: A vision-language model for whole slide images in histopathology

Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado, Quang Duong, Dale R. Webster, Shravya Shetty, Daniel Golden, Yun Liu, David F. Steiner, Ellery Wulczyn

Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.

7/1/2024

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, Jingxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology image-text pairs from platforms like PubMed, YouTube, and Twitter, which provide limited, unscalable data with generally suboptimal image quality. In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches. We then train a large multimodal model to generate captions for these images, creating PathGen-1.6M, a dataset containing 1.6 million high-quality image-caption pairs. Our approach involves multiple agent models collaborating to extract representative WSI patches, generating and refining captions to obtain high-quality image-text pairs. Extensive experiments show that integrating these generated pairs with existing datasets to train a pathology-specific CLIP model, PathGen-CLIP, significantly enhances its ability to analyze pathological images, with substantial improvements across nine pathology-related zero-shot image classification tasks and three whole-slide image tasks. Furthermore, we construct 200K instruction-tuning data based on PathGen-1.6M and integrate PathGen-CLIP with the Vicuna LLM to create more powerful multimodal models through instruction tuning. Overall, we provide a scalable pathway for high-quality data generation in pathology, paving the way for next-generation general pathology models.

7/2/2024

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Pingyi Chen, Chenglu Zhu, Sunyi Zheng, Honglin Li, Lin Yang

Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs by generative visual question answering. WSI-VQA shows universality by reframing various kinds of slide-level tasks in a question-answering pattern, in which pathologists can achieve immunohistochemical grading, survival prediction, and tumor subtyping following human-machine interaction. Furthermore, we establish a WSI-VQA dataset which contains 8672 slide-level question-answering pairs with 977 WSIs. Besides the ability to deal with different slide-level tasks, our generative model which is named Wsi2Text Transformer (W2T) outperforms existing discriminative models in medical correctness, which reveals the potential of our model to be applied in the clinical scenario. Additionally, we also visualize the co-attention mapping between word embeddings and WSIs as an intuitive explanation for diagnostic results. The dataset and related code are available at https://github.com/cpystan/WSI-VQA.

7/9/2024