Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

2404.08023

YC

0

Reddit

0

Published 4/15/2024 by Zeyu Zhang, Yuanshen Zhao, Jingxian Duan, Yaou Liu, Hairong Zheng, Dong Liang, Zhenyu Zhang, Zhi-Cheng Li
Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

Abstract

The diagnosis and prognosis of cancer are typically based on multi-modal clinical data, including histology images and genomic data, due to the complex pathogenesis and high heterogeneity. Despite the advancements in digital pathology and high-throughput genome sequencing, establishing effective multi-modal fusion models for survival prediction and revealing the potential association between histopathology and transcriptomics remains challenging. In this paper, we propose Pathology-Genome Heterogeneous Graph (PGHG) that integrates whole slide images (WSI) and bulk RNA-Seq expression data with heterogeneous graph neural network for cancer survival analysis. The PGHG consists of biological knowledge-guided representation learning network and pathology-genome heterogeneous graph. The representation learning network utilizes the biological prior knowledge of intra-modal and inter-modal data associations to guide the feature extraction. The node features of each modality are updated through attention-based graph learning strategy. Unimodal features and bi-modal fused features are extracted via attention pooling module and then used for survival prediction. We evaluate the model on low-grade gliomas, glioblastoma, and kidney renal papillary cell carcinoma datasets from the Cancer Genome Atlas (TCGA) and the First Affiliated Hospital of Zhengzhou University (FAHZU). Extensive experimental results demonstrate that the proposed method outperforms both unimodal and other multi-modal fusion models. For demonstrating the model interpretability, we also visualize the attention heatmap of pathological images and utilize integrated gradient algorithm to identify important tissue structure, biological pathways and key genes.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • Heterogeneous Graph Neural Network
  • Multi-modal Fusion
  • Survival Analysis

Plain English Explanation

This research paper presents a novel approach to analyzing and predicting cancer patient survival rates using a combination of pathology and genomic data. The key idea is to build a "heterogeneous graph" that captures the complex relationships between different types of biological data, and then use a specialized machine learning model called a "Graph Neural Network" to learn from this graph and make survival predictions.

The researchers recognized that cancer is a highly complex disease, with many interacting biological factors that influence a patient's prognosis. By integrating pathology (e.g., microscopic images of tumor tissue) and genomic (e.g., DNA sequencing) data, they aimed to get a more comprehensive understanding of each patient's unique cancer profile. The heterogeneous graph they constructed allows the model to uncover hidden connections between these different data modalities.

Through this multi-modal data fusion approach, the researchers were able to outperform traditional survival analysis methods that rely on a single data type. The graph-based neural network they developed can capture complex, nonlinear relationships in the data that are difficult to model using simpler statistical techniques.

This work has important implications for personalized cancer care, as it demonstrates the potential of integrating diverse biomedical data to make more accurate prognoses and tailor treatments to individual patients' needs. By leveraging state-of-the-art machine learning techniques, the researchers have taken a step towards realizing the promise of precision oncology and personalized medicine.

Technical Explanation

The researchers developed a Heterogeneous Graph Neural Network (HGNN) model to integrate pathology and genomic data for cancer survival analysis. First, they constructed a heterogeneous graph where the nodes represent different data modalities (e.g., gene expression, mutation status, histology images) and the edges capture the biological relationships between them.

This graph structure allows the model to learn cross-modal representations that capture the complex, nonlinear interactions between the various types of data. The HGNN architecture consists of specialized "graph convolution" layers that propagate information across the graph, enabling the model to learn rich, multi-faceted features from the integrated data.

The researchers evaluated their approach on several cancer cohorts and found that the HGNN model significantly outperformed traditional survival analysis techniques, as well as simpler multi-modal fusion methods. By leveraging the complementary information from pathology and genomics, the HGNN was able to make more accurate survival predictions for individual patients.

Critical Analysis

The authors acknowledge several limitations of their work, including the need for further validation on larger and more diverse patient cohorts. They also note that the interpretability of the HGNN model could be improved, as the graph structure and learned representations may not be easily interpretable by domain experts.

Additionally, the construction of the heterogeneous graph relies on domain-specific knowledge to define the relationships between different data modalities. While this "biologically informed" approach is a strength of the work, it may also limit the generalizability of the method to other disease contexts where such prior knowledge is not available.

Future research could explore more automated ways of building the heterogeneous graph, or investigate alternative neural network architectures that can learn the graph structure from the data itself. Incorporating additional data sources, such as clinical variables or patient demographics, may also help to further improve the model's predictive performance and clinical utility.

Conclusion

This research represents an important step towards integrating diverse biomedical data for personalized cancer prognosis and treatment. By leveraging a novel Heterogeneous Graph Neural Network architecture, the authors have demonstrated the potential of multi-modal data fusion to unlock insights that may be inaccessible through traditional survival analysis methods.

The successful application of this approach to cancer datasets suggests that it could have broader implications for other complex diseases, where the integration of multiple data modalities may be crucial for understanding disease mechanisms and improving patient outcomes. As the field of precision medicine continues to evolve, techniques like the one presented in this paper will likely play an increasingly important role in translating biomedical discoveries into clinical practice.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔮

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Guillaume Jaume, Anurag Vaidya, Richard Chen, Drew Williamson, Paul Liang, Faisal Mahmood

YC

0

Reddit

0

Integrating whole-slide images (WSIs) and bulk transcriptomics for predicting patient survival can improve our understanding of patient prognosis. However, this multimodal task is particularly challenging due to the different nature of these data: WSIs represent a very high-dimensional spatial description of a tumor, while bulk transcriptomics represent a global description of gene expression levels within that tumor. In this context, our work aims to address two key challenges: (1) how can we tokenize transcriptomics in a semantically meaningful and interpretable way?, and (2) how can we capture dense multimodal interactions between these two modalities? Specifically, we propose to learn biological pathway tokens from transcriptomics that can encode specific cellular functions. Together with histology patch tokens that encode the different morphological patterns in the WSI, we argue that they form appropriate reasoning units for downstream interpretability analyses. We propose fusing both modalities using a memory-efficient multimodal Transformer that can model interactions between pathway and histology patch tokens. Our proposed model, SURVPATH, achieves state-of-the-art performance when evaluated against both unimodal and multimodal baselines on five datasets from The Cancer Genome Atlas. Our interpretability framework identifies key multimodal prognostic factors, and, as such, can provide valuable insights into the interaction between genotype and phenotype, enabling a deeper understanding of the underlying biological mechanisms at play. We make our code public at: https://github.com/ajv012/SurvPath.

Read more

4/16/2024

Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes

Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes

Asim Waqas, Aakash Tripathi, Paul Stewart, Mia Naeini, Ghulam Rasool

YC

0

Reddit

0

Cancer clinics capture disease data at various scales, from genetic to organ level. Current bioinformatic methods struggle to handle the heterogeneous nature of this data, especially with missing modalities. We propose PARADIGM, a Graph Neural Network (GNN) framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction. PARADIGM generates embeddings from multi-resolution data using foundation models, aggregates them into patient-level representations, fuses them into a unified graph, and enhances performance for tasks like survival analysis. We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data. Multimodal GNN outperforms other models in patient survival prediction. Converging individual data modalities across varying scales provides a more insightful disease view. Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.

Read more

6/14/2024

Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images

Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images

Minghao Han, Xukun Zhang, Dingkang Yang, Tao Liu, Haopeng Kuang, Jinghui Feng, Lihua Zhang

YC

0

Reddit

0

Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images. Existing deep learning approaches mainly adopt multiple instance learning or graph neural networks under weak supervision. Most of them are unable to uncover the diverse interactions between different types of biological entities(textit{e.g.}, cell cluster and tissue block) across multiple scales, while such interactions are crucial for patient survival prediction. In light of this, we propose a novel multi-scale heterogeneity-aware hypergraph representation framework. Specifically, our framework first constructs a multi-scale heterogeneity-aware hypergraph and assigns each node with its biological entity type. It then mines diverse interactions between nodes on the graph structure to obtain a global representation. Experimental results demonstrate that our method outperforms state-of-the-art approaches on three benchmark datasets. Code is publicly available at href{https://github.com/Hanminghao/H2GT}{https://github.com/Hanminghao/H2GT}.

Read more

5/1/2024

Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

Songhan Jiang, Zhengyu Gan, Linghan Cai, Yifeng Wang, Yongbing Zhang

YC

0

Reddit

0

Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tumor microenvironment (TME). (2) Existing multimodal methods often rely on alignment strategies to integrate complementary information, which may lead to information loss due to the inherent heterogeneity between pathology and genes. In this paper, we propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks. Specifically, to capture TME-related features in WSIs, we leverage the subtype classification task to mine tumor regions. Simultaneously, multi-head attention mechanisms are applied in genomic feature extraction, adaptively performing genes grouping to obtain task-related genomic embedding. With the joint representation of pathological images and genomic data, we further introduce a Transport-Guided Attention (TGA) module that uses optimal transport theory to model the correlation between subtype classification and survival analysis tasks, effectively transferring potential information. Extensive experiments demonstrate the superiority of our approaches, with MCTI outperforming state-of-the-art frameworks on three public benchmarks. href{https://github.com/jsh0792/MCTI}{https://github.com/jsh0792/MCTI}.

Read more

6/26/2024