Can virtual staining for high-throughput screening generalize?

Read original: arXiv:2407.06979 - Published 8/14/2024 by Samuel Tonks, Cuong Nguyen, Steve Hood, Ryan Musso, Ceridwen Hopely, Steve Titus, Minh Doan, Iain Styles, Alexander Krull

Can virtual staining for high-throughput screening generalize?

Overview

This paper explores whether virtual staining, a technique that uses deep learning to predict stained images from unstained samples, can be generalized to different datasets and applied to high-throughput screening.
The researchers investigate how well virtual staining models trained on one dataset perform on other datasets, and whether these models can be used to accelerate drug discovery by rapidly screening large numbers of compounds.

Plain English Explanation

The paper looks at a technique called virtual staining, which uses artificial intelligence (AI) to predict what a stained biological sample would look like, based on an unstained image. This is useful because staining samples is a time-consuming and expensive process in medical research and drug discovery.

The researchers wanted to see how well virtual staining models trained on one dataset would work on other datasets. They also investigated whether these virtual staining models could be used to quickly screen large numbers of compounds as part of the drug discovery process.

This is an important question because if virtual staining can be applied broadly and used for high-throughput screening, it could significantly speed up and reduce the cost of developing new drugs and treatments.

Technical Explanation

The paper evaluates the generalization capabilities of virtual staining models trained on one dataset and applied to other datasets. The researchers trained virtual staining models on a dataset of histology slides and tested them on other datasets, including multi-target stain normalization and pathological semantics-preserving learning datasets.

They also investigated using virtual staining for high-throughput screening of drug compounds, where rapid screening of large numbers of compounds is crucial for identifying potential drug candidates.

The results show that virtual staining models can generalize to some extent, but performance degrades when applied to datasets with significant differences in imaging modalities, tissue types, or staining protocols. The authors also found that while virtual staining can accelerate screening, it may not completely replace physical staining in all cases.

Critical Analysis

The paper provides a thorough evaluation of virtual staining's generalization capabilities, but it also acknowledges several limitations. The authors note that performance degrades when applying models to datasets with significant differences, suggesting that further research is needed to improve cross-dataset generalization.

Additionally, while virtual staining can speed up high-throughput screening, the authors caution that it may not fully replace physical staining in all cases. This highlights the need to carefully evaluate the trade-offs and limitations of virtual staining, particularly in mission-critical applications.

Overall, the research represents an important step in understanding the real-world applicability of virtual staining, but more work is needed to address the challenges of generalization and ensure the reliability of the technology for high-stakes use cases.

Conclusion

This paper explores the generalization and high-throughput screening capabilities of virtual staining, a promising technique that uses AI to predict stained images from unstained samples. The results suggest that while virtual staining can be applied across different datasets to some degree, performance is affected by factors like imaging modalities and tissue types.

The authors also demonstrate that virtual staining can accelerate drug discovery by enabling rapid screening of large compound libraries, but it may not completely replace physical staining in all cases. These findings highlight the need for continued research and careful evaluation of virtual staining's strengths and limitations as the technology evolves.

Overall, this work provides valuable insights into the real-world applicability of virtual staining and its potential to streamline critical processes in medical research and drug development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Can virtual staining for high-throughput screening generalize?

Samuel Tonks, Cuong Nguyen, Steve Hood, Ryan Musso, Ceridwen Hopely, Steve Titus, Minh Doan, Iain Styles, Alexander Krull

The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung, ovarian, and breast) and two phenotypes (toxic and non-toxic conditions) commonly found in HTS can effectively train virtual staining models to generalize across three typical HTS distribution shifts: unseen phenotypes, unseen cell types, and the combination of both. Utilizing a dataset of 772,416 paired bright-field, cytoplasm, nuclei, and DNA-damage stain images, we evaluate the generalization capabilities of models across pixel-based, instance-wise, and biological-feature-based levels. Our findings indicate that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples. Generalization to unseen cell types shows variability depending on the cell type; models trained on ovarian or lung cell samples often perform well under other conditions, while those trained on breast cell samples consistently show poor generalization. Generalization to unseen cell types and phenotypes shows good generalization across all levels of evaluation compared to addressing unseen cell types alone. This study represents the first large-scale, data-centric analysis of the generalization capability of virtual staining models trained on diverse HTS datasets, providing valuable strategies for experimental training data generation.

8/14/2024

Scalable, Trustworthy Generative Model for Virtual Multi-Staining from H&E Whole Slide Images

Mehdi Ounissi, Ilias Sarbout, Jean-Pierre Hugot, Christine Martinez-Vinson, Dominique Berrebi, Daniel Racoceanu

Chemical staining methods are dependable but require extensive time, expensive chemicals, and raise environmental concerns. These challenges highlight the need for alternative solutions like virtual staining, which accelerates the diagnostic process and enhances stain application flexibility. Generative AI technologies are pivotal in addressing these issues. However, the high-stakes nature of healthcare decisions, especially in computational pathology, complicates the adoption of these tools due to their opaque processes. Our work introduces the use of generative AI for virtual staining, aiming to enhance performance, trustworthiness, scalability, and adaptability in computational pathology. The methodology centers on a singular H&E encoder supporting multiple stain decoders. This design focuses on critical regions in the latent space of H&E, enabling precise synthetic stain generation. Our method, tested to generate 8 different stains from a single H&E slide, offers scalability by loading only necessary model components during production. We integrate label-free knowledge in training, using loss functions and regularization to minimize artifacts, thus improving paired/unpaired virtual staining accuracy. To build trust, we use real-time self-inspection with discriminators for each stain type, providing pathologists with confidence heat-maps. Automatic quality checks on new H&E slides ensure conformity to the trained distribution, ensuring accurate synthetic stains. Recognizing pathologists' challenges with new technologies, we have developed an open-source, cloud-based system, that allows easy virtual staining of H&E slides through a browser, addressing hardware/software issues and facilitating real-time user feedback. We also curated a novel dataset of 8 paired H&E/stains related to pediatric Crohn's disease, comprising 480 WSIs to further stimulate computational pathology research.

7/2/2024

New!Impact of Stain Variation and Color Normalization for Prognostic Predictions in Pathology

Siyu (Steven), Lin, Haowen Zhou, Richard J. Cote, Mark Watson, Ramaswamy Govindan, Changhuei Yang

In recent years, deep neural networks (DNNs) have demonstrated remarkable performance in pathology applications, potentially even outperforming expert pathologists due to their ability to learn subtle features from large datasets. One complication in preparing digital pathology datasets for DNN tasks is variation in tinctorial qualities. A common way to address this is to perform stain normalization on the images. In this study, we show that a well-trained DNN model trained on one batch of histological slides failed to generalize to another batch prepared at a different time from the same tissue blocks, even when stain normalization methods were applied. This study used sample data from a previously reported DNN that was able to identify patients with early stage non-small cell lung cancer (NSCLC) whose tumors did and did not metastasize, with high accuracy, based on training and then testing of digital images from H&E stained primary tumor tissue sections processed at the same time. In this study we obtained a new series of histologic slides from the adjacent recuts of same tissue blocks processed in the same lab but at a different time. We found that the DNN trained on the either batch of slides/images was unable to generalize and failed to predict progression in the other batch of slides/images (AUC_cross-batch = 0.52 - 0.53 compared to AUC_same-batch = 0.74 - 0.81). The failure to generalize did not improve even when the tinctorial difference correction were made through either traditional color-tuning or stain normalization with the help of a Cycle Generative Adversarial Network (CycleGAN) process. This highlights the need to develop an entirely new way to process and collect consistent microscopy images from histologic slides that can be used to both train and allow for the general application of predictive DNN algorithms.

9/16/2024

🔮

Efficient and generalizable prediction of molecular alterations in multiple cancer cohorts using H&E whole slide images

Kshitij Ingale, Sun Hae Hong, Qiyuan Hu, Renyu Zhang, Bo Osinski, Mina Khoshdeli, Josh Och, Kunal Nagpal, Martin C. Stumpe, Rohan P. Joshi

Molecular testing of tumor samples for targetable biomarkers is restricted by a lack of standardization, turnaround-time, cost, and tissue availability across cancer types. Additionally, targetable alterations of low prevalence may not be tested in routine workflows. Algorithms that predict DNA alterations from routinely generated hematoxylin and eosin (H&E)-stained images could prioritize samples for confirmatory molecular testing. Costs and the necessity of a large number of samples containing mutations limit approaches that train individual algorithms for each alteration. In this work, models were trained for simultaneous prediction of multiple DNA alterations from H&E images using a multi-task approach. Compared to biomarker-specific models, this approach performed better on average, with pronounced gains for rare mutations. The models reasonably generalized to independent temporal-holdout, externally-stained, and multi-site TCGA test sets. Additionally, whole slide image embeddings derived using multi-task models demonstrated strong performance in downstream tasks that were not a part of training. Overall, this is a promising approach to develop clinically useful algorithms that provide multiple actionable predictions from a single slide.

7/23/2024