Federated Diabetes Prediction in Canadian Adults Using Real-world Cross-Province Primary Care Data

Read original: arXiv:2408.12029 - Published 8/23/2024 by Guojun Tang, Jason E. Black, Tyler S. Williamson, Steve H. Drew

🔮

Overview

This paper explores the use of federated learning, a machine learning technique, to predict diabetes risk using electronic health records (EHR) data.
Federated learning allows for the development of predictive models without the need for centralized data storage, addressing privacy concerns.
The researchers apply federated learning to real-world clinical datasets from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) to predict diabetes risk.
The study compares the performance of federated learning models to province-based and centralized models, addressing class imbalance issues through downsampling techniques.

Plain English Explanation

Predicting diabetes risk using machine learning can lead to earlier detection and more effective treatment. However, centralizing sensitive healthcare data raises privacy concerns. Federated learning offers a solution by allowing predictive models to be developed without the need to share individual patient data.

In this study, the researchers used federated learning to predict diabetes risk using real clinical data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). This approach allows for the creation of a predictive model without requiring the data to be stored in a central location, addressing privacy issues.

The researchers compared the performance of the federated learning model to models trained on data from individual provinces and a centralized model. They also addressed the challenge of having more non-diabetic patients than diabetic patients (known as class imbalance) by using downsampling techniques.

The results showed that the federated learning model performed similarly or better than the centralized model. However, a federated logistic regression model performed worse than its centralized counterpart. This suggests that the choice of machine learning algorithm is important when using federated learning for healthcare applications.

Technical Explanation

The paper introduces a federated learning approach to predict diabetes risk using electronic health records (EHR) data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Federated learning allows for the development of predictive models without the need for centralized data storage, addressing privacy concerns associated with centralized healthcare data.

The researchers compared the performance of federated learning models to province-based and centralized models. They used downsampling techniques to address class imbalance issues, as the datasets had a higher proportion of non-diabetic patients compared to diabetic patients.

The experimental results showed that the federated multilayer perceptron (MLP) model presented similar or higher performance compared to the centralized approach. However, the federated logistic regression model showed inferior performance compared to its centralized peer.

Critical Analysis

The paper presents a promising approach to predicting diabetes risk using federated learning, which addresses the privacy concerns associated with centralized healthcare data storage. The use of real-world clinical datasets from the CPCSSN adds to the practical relevance of the research.

However, the study only compares the performance of two machine learning models, MLP and logistic regression. It would be valuable to explore the performance of other algorithms, such as decision trees or random forests, to determine the most suitable model for this application.

Additionally, the paper does not address the potential challenges of implementing federated learning in a real-world healthcare setting, such as the technical and organizational complexities of coordinating multiple participating institutions. Further research is needed to understand the practical barriers to deploying federated learning solutions in clinical practice.

Conclusion

This research demonstrates the potential of federated learning for enhancing the accuracy and accessibility of data-driven diabetes prediction while addressing privacy concerns. The ability to develop predictive models without centralizing sensitive healthcare data is a significant advancement that could lead to more effective therapeutic strategies and reduced healthcare costs.

The findings suggest that the choice of machine learning algorithm is crucial when applying federated learning to healthcare applications. Further research is needed to explore the practical implementation challenges and expand the range of models evaluated to identify the most suitable approach for predicting diabetes risk using EHR data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔮

Federated Diabetes Prediction in Canadian Adults Using Real-world Cross-Province Primary Care Data

Guojun Tang, Jason E. Black, Tyler S. Williamson, Steve H. Drew

Integrating Electronic Health Records (EHR) and the application of machine learning present opportunities for enhancing the accuracy and accessibility of data-driven diabetes prediction. In particular, developing data-driven machine learning models can provide early identification of patients with high risk for diabetes, potentially leading to more effective therapeutic strategies and reduced healthcare costs. However, regulation restrictions create barriers to developing centralized predictive models. This paper addresses the challenges by introducing a federated learning approach, which amalgamates predictive models without centralized data storage and processing, thus avoiding privacy issues. This marks the first application of federated learning to predict diabetes using real clinical datasets in Canada extracted from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) without crossprovince patient data sharing. We address class-imbalance issues through downsampling techniques and compare federated learning performance against province-based and centralized models. Experimental results show that the federated MLP model presents a similar or higher performance compared to the model trained with the centralized approach. However, the federated logistic regression model showed inferior performance compared to its centralized peer.

8/23/2024

🔮

Privacy-preserving federated prediction of pain intensity change based on multi-center survey data

Supratim Das, Mahdie Rafie, Paula Kammer, S{o}ren T. Skou, Dorte T. Gr{o}nne, Ewa M. Roos, Andr'e Hajek, Hans-Helmut Konig, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach

Background: Patient-reported survey data are used to train prognostic models aimed at improving healthcare. However, such data are typically available multi-centric and, for privacy reasons, cannot easily be centralized in one data repository. Models trained locally are less accurate, robust, and generalizable. We present and apply privacy-preserving federated machine learning techniques for prognostic model building, where local survey data never leaves the legally safe harbors of the medical centers. Methods: We used centralized, local, and federated learning techniques on two healthcare datasets (GLA:D data from the five health regions of Denmark and international SHARE data of 27 countries) to predict two different health outcomes. We compared linear regression, random forest regression, and random forest classification models trained on local data with those trained on the entire data in a centralized and in a federated fashion. Results: In GLA:D data, federated linear regression (R2 0.34, RMSE 18.2) and federated random forest regression (R2 0.34, RMSE 18.3) models outperform their local counterparts (i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance. We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5, respectively) did not perform significantly better than the federated models. In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC 0.84, AUROC: 0.66) perform significantly better than the local models (AC: 0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of prognostic models from multi-center surveys without compromising privacy and with only minimal or no compromise regarding model performance.

9/14/2024

📈

Federated learning model for predicting major postoperative complications

Yonggi Park, Yuanfang Ren, Benjamin Shickel, Ziyuan Guan, Ayush Patela, Yingbo Ma, Zhenhong Hu, Tyler J. Loftus, Parisa Rashidi, Tezcan Ozrazgat-Baslanti, Azra Bihorac

Background: The accurate prediction of postoperative complication risk using Electronic Health Records (EHR) and artificial intelligence shows great potential. Training a robust artificial intelligence model typically requires large-scale and diverse datasets. In reality, collecting medical data often encounters challenges surrounding privacy protection. Methods: This retrospective cohort study includes adult patients who were admitted to UFH Gainesville (GNV) (n = 79,850) and Jacksonville (JAX) (n = 28,636) for any type of inpatient surgical procedure. Using perioperative and intraoperative features, we developed federated learning models to predict nine major postoperative complications (i.e., prolonged intensive care unit stay and mechanical ventilation). We compared federated learning models with local learning models trained on a single site and central learning models trained on pooled dataset from two centers. Results: Our federated learning models achieved the area under the receiver operating characteristics curve (AUROC) values ranged from 0.81 for wound complications to 0.92 for prolonged ICU stay at UFH GNV center. At UFH JAX center, these values ranged from 0.73-0.74 for wound complications to 0.92-0.93 for hospital mortality. Federated learning models achieved comparable AUROC performance to central learning models, except for prolonged ICU stay, where the performance of federated learning models was slightly higher than central learning models at UFH GNV center, but slightly lower at UFH JAX center. In addition, our federated learning model obtained comparable performance to the best local learning model at each center, demonstrating strong generalizability. Conclusion: Federated learning is shown to be a useful tool to train robust and generalizable models from large scale data across multiple institutions where data protection barriers are high.

4/11/2024

Democratizing AI in Africa: FL for Low-Resource Edge Devices

Jorge Fabila, V'ictor M. Campello, Carlos Mart'in-Isla, Johnes Obungoloch, Kinyera Leo, Amodoi Ronald, Karim Lekadir

Africa faces significant challenges in healthcare delivery due to limited infrastructure and access to advanced medical technologies. This study explores the use of federated learning to overcome these barriers, focusing on perinatal health. We trained a fetal plane classifier using perinatal data from five African countries: Algeria, Ghana, Egypt, Malawi, and Uganda, along with data from Spanish hospitals. To incorporate the lack of computational resources in the analysis, we considered a heterogeneous set of devices, including a Raspberry Pi and several laptops, for model training. We demonstrate comparative performance between a centralized and a federated model, despite the compute limitations, and a significant improvement in model generalizability when compared to models trained only locally. These results show the potential for a future implementation at a large scale of a federated learning platform to bridge the accessibility gap and improve model generalizability with very little requirements.

9/2/2024