Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a beta-Variational Autoencoder

Read original: arXiv:2408.06720 - Published 8/26/2024 by Gizem Mert, Ario Sadafi, Raheleh Salehi, Nassir Navab, Carsten Marr

Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a beta-Variational Autoencoder

Overview

Multimodal analysis of white blood cell differentiation in acute myeloid leukemia (AML) patients using a β-Variational Autoencoder (β-VAE)
Combines single-cell RNA sequencing (scRNA-seq) and flow cytometry data to learn a multimodal embedding for white blood cells
Aims to improve understanding of the complex cellular dynamics in AML

Plain English Explanation

The paper explores a novel approach to analyzing white blood cell differences in patients with a type of blood cancer called acute myeloid leukemia (AML). The researchers combined two powerful data sources - single-cell RNA sequencing and flow cytometry - to create a multimodal embedding, which is a way of representing the complex relationships between different cell types.

By using a specialized type of neural network called a β-Variational Autoencoder (β-VAE), the team was able to learn this multimodal embedding and gain new insights into the cellular dynamics of AML. This is particularly important because AML is a challenging disease that involves major changes in the development and behavior of white blood cells.

The multimodal approach allowed the researchers to paint a more complete picture of the white blood cell landscape in AML patients, going beyond what could be gleaned from either the RNA sequencing or flow cytometry data alone. This could lead to better understanding of the disease and potentially improved treatments in the future.

Technical Explanation

The researchers utilized a β-VAE to learn a multimodal embedding of white blood cells from scRNA-seq and flow cytometry data collected from AML patients. The β-VAE is a type of autoencoder that is able to capture the underlying structure of complex, high-dimensional datasets by learning a compressed, low-dimensional representation.

By incorporating both the gene expression (scRNA-seq) and surface protein (flow cytometry) modalities, the β-VAE was able to learn a multimodal embedding that better reflects the true biological complexity of the white blood cell populations. This multimodal approach outperformed using either data source alone for downstream tasks like cell type classification and trajectory inference.

The learned embedding allowed the researchers to visualize the relationships between different white blood cell types and their developmental trajectories. They were able to identify distinct subpopulations of cells, including some that were associated with specific AML subtypes. This provides new insights into the cellular heterogeneity and dysregulation that occurs in AML.

Critical Analysis

The authors acknowledge several limitations of their approach. First, the dataset used was relatively small, consisting of only 10 AML patients. Larger, more diverse datasets would be needed to fully capture the heterogeneity of AML and validate the generalizability of the findings.

Additionally, the study focused on white blood cell differentiation, but did not explore other cell types, such as red blood cells or platelets, which are also impacted in AML. Expanding the multimodal analysis to these additional cell populations could provide a more comprehensive understanding of the disease.

The authors also note that the β-VAE architecture and hyperparameters were not extensively tuned, and that further optimization of the model could potentially improve the quality of the learned multimodal embedding. Comparisons to other multimodal learning approaches would also help to better situate the strengths and weaknesses of the β-VAE in this context.

Conclusion

This study demonstrates the power of combining multiple data modalities, in this case scRNA-seq and flow cytometry, to gain a more holistic understanding of cellular heterogeneity and differentiation in the context of a complex disease like AML. The multimodal embedding learned by the β-VAE provides new insights into the cellular dynamics of AML and could potentially inform the development of more targeted therapies in the future.

While the current implementation has some limitations, the general approach of leveraging complementary data sources through advanced machine learning techniques represents an exciting direction for biomedical research. As datasets and models continue to improve, multimodal analysis is poised to become an increasingly valuable tool for unveiling the complexities of human health and disease.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a beta-Variational Autoencoder

Gizem Mert, Ario Sadafi, Raheleh Salehi, Nassir Navab, Carsten Marr

Biomedical imaging and RNA sequencing with single-cell resolution improves our understanding of white blood cell diseases like leukemia. By combining morphological and transcriptomic data, we can gain insights into cellular functions and trajectoriess involved in blood cell differentiation. However, existing methodologies struggle with integrating morphological and transcriptomic data, leaving a significant research gap in comprehensively understanding the dynamics of cell differentiation. Here, we introduce an unsupervised method that explores and reconstructs these two modalities and uncovers the relationship between different subtypes of white blood cells from human peripheral blood smears in terms of morphology and their corresponding transcriptome. Our method is based on a beta-variational autoencoder ({ss}-VAE) with a customized loss function, incorporating a R-CNN architecture to distinguish single-cell from background and to minimize any interference from artifacts. This implementation of {ss}-VAE shows good reconstruction capability along with continuous latent embeddings, while maintaining clear differentiation between single-cell classes. Our novel approach is especially helpful to uncover the correlation of two latent features in complex biological processes such as formation of granules in the cell (granulopoiesis) with gene expression patterns. It thus provides a unique tool to improve the understanding of white blood cell maturation for biomedicine and diagnostics.

8/26/2024

Neural Cellular Automata for Lightweight, Robust and Explainable Classification of White Blood Cell Images

Michael Deutges, Ario Sadafi, Nassir Navab, Carsten Marr

Diagnosis of hematological malignancies depends on accurate identification of white blood cells in peripheral blood smears. Deep learning techniques are emerging as a viable solution to scale and optimize this process by automatic cell classification. However, these techniques face several challenges such as limited generalizability, sensitivity to domain shifts, and lack of explainability. Here, we introduce a novel approach for white blood cell classification based on neural cellular automata (NCA). We test our approach on three datasets of white blood cell images and show that we achieve competitive performance compared to conventional methods. Our NCA-based method is significantly smaller in terms of parameters and exhibits robustness to domain shifts. Furthermore, the architecture is inherently explainable, providing insights into the decision process for each classification, which helps to understand and validate model predictions. Our results demonstrate that NCA can be used for image classification, and that they address key challenges of conventional methods, indicating a high potential for applicability in clinical practice.

8/1/2024

A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability

Abdul Rehman, Talha Meraj, Aiman Mahmood Minhas, Ayisha Imran, Mohsen Ali, Waqas Sultani

Earlier diagnosis of Leukemia can save thousands of lives annually. The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and large dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrums (high-cost HCM and low-cost LCM) are used for dataset capturing at three magnifications (100x, 40x, 10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC types (14) and artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc.) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other 2 magnifications of HCM, and 3 magnifications of LCM, and on each camera captured images. Along with the LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset will be publicly available after publication to facilitate the research in this direction.

5/20/2024

Deep Generative Classification of Blood Cell Morphology

Simon Deltadahl, Julian Gilbey, Christine Van Laer, Nancy Boeckx, Mathie Leers, Tanya Freeman, Laura Aiken, Timothy Farren, Matthew Smith, Mohamad Zeina, BloodCounts! consortium, Concetta Piazzese, Joseph Taylor, Nicholas Gleadall, Carola-Bibiane Schonlieb, Suthesh Sivapalaratnam, Michael Roberts, Parashkev Nachev

Accurate classification of haematological cells is critical for diagnosing blood disorders, but presents significant challenges for machine automation owing to the complexity of cell morphology, heterogeneities of biological, pathological, and imaging characteristics, and the imbalance of cell type frequencies. We introduce CytoDiffusion, a diffusion-based classifier that effectively models blood cell morphology, combining accurate classification with robust anomaly detection, resistance to distributional shifts, interpretability, data efficiency, and superhuman uncertainty quantification. Our approach outperforms state-of-the-art discriminative models in anomaly detection (AUC 0.976 vs. 0.919), resistance to domain shifts (85.85% vs. 74.38% balanced accuracy), and performance in low-data regimes (95.88% vs. 94.95% balanced accuracy). Notably, our model generates synthetic blood cell images that are nearly indistinguishable from real images, as demonstrated by a Turing test in which expert haematologists achieved only 52.3% accuracy (95% CI: [50.5%, 54.2%]). Furthermore, we enhance model explainability through the generation of directly interpretable counterfactual heatmaps. Our comprehensive evaluation framework, encompassing these multiple performance dimensions, establishes a new benchmark for medical image analysis in haematology, ultimately enabling improved diagnostic accuracy in clinical settings. Our code is available at https://github.com/Deltadahl/CytoDiffusion.

8/20/2024