A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

Read original: arXiv:2307.11465 - Published 7/2/2024 by Camillo Maria Caruso, Valerio Guarrasi, Sara Ramella, Paolo Soda

🤿

Overview

Developing an AI model to handle missing data in lung cancer survival analysis
Leveraging all available data, including censored and uncensored patients
Aiming to provide precise overall survival (OS) predictions for non-small cell lung cancer (NSCLC) patients

Plain English Explanation

When analyzing the overall survival of lung cancer patients, missing data can be a significant challenge. This research aims to develop an AI model that can effectively handle this missing information. The model also aims to use all the available data, including patients who have experienced the event of interest (uncensored) and those who have not (censored). By incorporating these capabilities, the researchers hope to provide accurate predictions of overall survival for NSCLC patients, overcoming important obstacles in this field.

The researchers present a novel approach that uses a specialized technique within a transformer-based AI model. This model is designed to work with tabular data, such as the medical records used in this study, by adapting the feature embedding and self-attention mechanisms to account for missing data without requiring any imputation strategy. The model is also trained using custom loss functions that can handle both censored and uncensored patients, as well as changes in risk over time.

Technical Explanation

The researchers developed a transformer-based model that is tailored for tabular data and can effectively handle missing values without the need for imputation. The model adapts the feature embedding and masked self-attention mechanisms of the transformer architecture to focus only on the available features, ignoring the missing ones.

The researchers also designed specialized loss functions to train the model. These loss functions can account for both censored and uncensored patients, as well as changes in risk over time. This allows the model to fully leverage all the available data, rather than discarding or imputing the missing values.

The researchers compared their method to state-of-the-art models for survival analysis, coupled with different imputation strategies. The evaluation was conducted over a 6-year period using different time granularities. The results show that the proposed model outperformed all the other methods, achieving a Ct-index (a time-dependent variant of the C-index) of 71.97, 77.58, and 80.72 for time units of 1 month, 1 year, and 2 years, respectively.

Critical Analysis

The paper presents a promising approach for handling missing data in survival analysis for cancer, particularly in the context of NSCLC. The researchers' use of a transformer-based model and custom loss functions to leverage all available data, including censored patients, is a notable contribution.

However, the paper does not discuss the potential limitations or caveats of the proposed method. For example, it would be helpful to understand how the model performs when the amount of missing data is particularly high or when the distribution of missing data is not random. Additionally, the paper could benefit from a more detailed discussion of the potential biases or confounding factors that may arise in the data and how the model addresses these challenges.

Further research could explore the model's performance on a wider range of cancer types or investigate the interpretability of the model's predictions, which is an important consideration in the medical domain.

Conclusion

This research presents a novel AI-based approach to handle missing data in the analysis of overall survival for NSCLC patients. By leveraging a transformer-based model and custom loss functions, the researchers have developed a method that can effectively use all available data, including censored patients, to provide accurate OS predictions.

The results demonstrate the potential of this approach to overcome significant challenges in the field of lung cancer research, where missing data is a prevalent issue. While the paper could benefit from a more thorough discussion of the method's limitations and potential biases, the proposed model represents an important step forward in the use of AI for survival analysis in the medical domain.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

Camillo Maria Caruso, Valerio Guarrasi, Sara Ramella, Paolo Soda

In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.

7/2/2024

Improving Lung Cancer Diagnosis and Survival Prediction with Deep Learning and CT Imaging

Xiawei Wang, James Sharpnack, Thomas C. M. Lee

Lung cancer is a major cause of cancer-related deaths, and early diagnosis and treatment are crucial for improving patients' survival outcomes. In this paper, we propose to employ convolutional neural networks to model the non-linear relationship between the risk of lung cancer and the lungs' morphology revealed in the CT images. We apply a mini-batched loss that extends the Cox proportional hazards model to handle the non-convexity induced by neural networks, which also enables the training of large data sets. Additionally, we propose to combine mini-batched loss and binary cross-entropy to predict both lung cancer occurrence and the risk of mortality. Simulation results demonstrate the effectiveness of both the mini-batched loss with and without the censoring mechanism, as well as its combination with binary cross-entropy. We evaluate our approach on the National Lung Screening Trial data set with several 3D convolutional neural network architectures, achieving high AUC and C-index scores for lung cancer classification and survival prediction. These results, obtained from simulations and real data experiments, highlight the potential of our approach to improving the diagnosis and treatment of lung cancer.

8/20/2024

Deep Neural Networks for Predicting Recurrence and Survival in Patients with Esophageal Cancer After Surgery

Yuhan Zheng, Jessie A Elliott, John V Reynolds, Sheraz R Markar, Bart{l}omiej W. Papie.z, ENSURE study group

Esophageal cancer is a major cause of cancer-related mortality internationally, with high recurrence rates and poor survival even among patients treated with curative-intent surgery. Investigating relevant prognostic factors and predicting prognosis can enhance post-operative clinical decision-making and potentially improve patients' outcomes. In this work, we assessed prognostic factor identification and discriminative performances of three models for Disease-Free Survival (DFS) and Overall Survival (OS) using a large multicenter international dataset from ENSURE study. We first employed Cox Proportional Hazards (CoxPH) model to assess the impact of each feature on outcomes. Subsequently, we utilised CoxPH and two deep neural network (DNN)-based models, DeepSurv and DeepHit, to predict DFS and OS. The significant prognostic factors identified by our models were consistent with clinical literature, with post-operative pathologic features showing higher significance than clinical stage features. DeepSurv and DeepHit demonstrated comparable discriminative accuracy to CoxPH, with DeepSurv slightly outperforming in both DFS and OS prediction tasks, achieving C-index of 0.735 and 0.74, respectively. While these results suggested the potential of DNNs as prognostic tools for improving predictive accuracy and providing personalised guidance with respect to risk stratification, CoxPH still remains an adequately good prediction model, with the data used in this study.

9/4/2024

Survival Prediction Across Diverse Cancer Types Using Neural Networks

Xu Yan, Weimin Wang, MingXuan Xiao, Yufeng Li, Min Gao

Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival prediction models for gastric and Colon adenocarcinoma patients. Leveraging advanced image analysis techniques, we sliced whole slide images (WSI) of these cancers, extracting comprehensive features to capture nuanced tumor characteristics. Subsequently, we constructed patient-level graphs, encapsulating intricate spatial relationships within tumor tissues. These graphs served as inputs for a sophisticated 4-layer graph convolutional neural network (GCN), designed to exploit the inherent connectivity of the data for comprehensive analysis and prediction. By integrating patients' total survival time and survival status, we computed C-index values for gastric cancer and Colon adenocarcinoma, yielding 0.57 and 0.64, respectively. Significantly surpassing previous convolutional neural network models, these results underscore the efficacy of our approach in accurately predicting patient survival outcomes. This research holds profound implications for both the medical and AI communities, offering insights into cancer biology and progression while advancing personalized treatment strategies. Ultimately, our study represents a significant stride in leveraging AI-driven methodologies to revolutionize cancer prognosis and improve patient outcomes on a global scale.

4/16/2024