A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

Read original: arXiv:2405.17206 - Published 5/28/2024 by Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque

🔎

Overview

Researchers developed a framework to detect Parkinson's disease (PD) by analyzing speech recordings from a diverse global dataset of over 1,300 participants, including 392 with PD.
They leveraged advanced speech recognition models like Wav2Vec 2.0, WavLM, and ImageBind to extract speech features associated with PD.
Their novel fusion model demonstrated superior performance for PD classification compared to standard approaches, achieving high accuracy and robustness across different demographic groups and disease stages.
The model's strong performance on unseen clinical datasets suggests its potential to enhance accessibility and equity in real-world PD detection.

Plain English Explanation

The researchers developed a way to detect Parkinson's disease (PD) by analyzing people's speech. They collected speech recordings from over 1,300 people around the world, including 392 with PD. They used advanced AI models like Wav2Vec 2.0 and WavLM to identify speech patterns associated with PD.

Their new fusion model combined these speech features in a smart way, allowing it to accurately detect PD with high accuracy. Importantly, the model performed well across different ages, genders, and ethnicities, and even worked well on new speech recordings from clinical settings. This suggests the model could help make PD detection more accessible and equitable in the real world.

Technical Explanation

The researchers built a framework to recognize Parkinson's disease (PD) using speech recordings collected through a web application. Their diverse dataset included 1,306 participants, 392 of whom had been diagnosed with PD.

To capture the speech dynamics associated with PD, the researchers leveraged deep learning embeddings from semi-supervised models like Wav2Vec 2.0, WavLM, and ImageBind. Their novel fusion model aligned these different speech embeddings into a cohesive feature space, outperforming standard concatenation-based fusion and other baselines.

In a randomized data split, the fusion model achieved an AUROC of 88.94% and an accuracy of 85.65% for PD classification. Statistical analysis confirmed the model's equitable performance across demographic subgroups and robustness to disease duration. When tested on two entirely unseen clinical datasets, the model maintained strong AUROC scores of 82.12% and 78.44%, demonstrating its potential for real-world deployment.

Critical Analysis

The researchers provide a comprehensive and rigorous evaluation of their PD classification framework, including testing on diverse, unseen datasets. This suggests the model's robustness and potential for practical application.

However, the paper does not delve into potential limitations or caveats. For example, it's unclear how the model would perform on speech recordings with significant background noise or on individuals with comorbidities. Additionally, the long-term stability and generalizability of the model's performance remain to be seen.

Further research could explore the model's interpretability, examining which specific speech features are most indicative of PD. This could offer insights into the underlying mechanisms of the disease and guide the development of more targeted diagnostic tools.

Overall, the researchers have made a promising step towards enhancing accessibility and equity in PD detection, but continued refinement and evaluation will be crucial for real-world deployment.

Conclusion

The researchers have developed a robust framework for recognizing Parkinson's disease (PD) through the analysis of speech recordings, leveraging advanced AI models to capture disease-associated speech dynamics. Their novel fusion model demonstrated superior performance and maintained high accuracy across diverse demographic groups and disease stages.

The strong results on unseen clinical datasets suggest the framework's potential to improve accessibility and health equity in PD detection. As the researchers continue to refine and evaluate the model, it could become a valuable tool for earlier diagnosis and personalized treatment, ultimately enhancing the quality of life for those affected by this debilitating condition.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque

We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such as age, sex, and ethnicity), we used deep learning embeddings derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind representing the speech dynamics associated with PD. Our novel fusion model for PD classification, which aligns different speech embeddings into a cohesive feature space, demonstrated superior performance over standard concatenation-based fusion models and other baselines (including models built on traditional acoustic features). In a randomized data split configuration, the model achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis confirmed that our model performs equitably across various demographic subgroups in terms of sex, ethnicity, and age, and remains robust regardless of disease duration. Furthermore, our model, when tested on two entirely unseen test datasets collected from clinical settings and from a PD care center, maintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the model's robustness and it's potential to enhance accessibility and health equity in real-world applications.

5/28/2024

🤿

Innovative Speech-Based Deep Learning Approaches for Parkinson's Disease Classification: A Systematic Review

Lisanne van Gelderen, Cristian Tejedor-Garc'ia

Parkinson's disease (PD), the second most prevalent neurodegenerative disorder worldwide, frequently presents with early-stage speech impairments. Recent advancements in Artificial Intelligence (AI), particularly deep learning (DL), have significantly enhanced PD diagnosis through the analysis of speech data. Nevertheless, the progress of research is restricted by the limited availability of publicly accessible speech-based PD datasets, primarily due to privacy concerns. The goal of this systematic review is to explore the current landscape of speech-based DL approaches for PD classification, based on 33 scientific works published between January 2020 and March 2024. We discuss their available resources, capabilities, and potential limitations, and issues related to bias, explainability, and privacy. Furthermore, this review provides an overview of publicly accessible speech-based datasets and open-source material for PD. The DL approaches identified are categorized into end-to-end (E2E) learning, transfer learning (TL), and deep acoustic feature extraction (DAFE). Among E2E approaches, Convolutional Neural Networks (CNNs) are prevalent, though Transformers are increasingly popular. E2E approaches face challenges such as limited data and computational resources, especially with Transformers. TL addresses these issues by providing more robust PD diagnosis and better generalizability across languages. DAFE aims to improve the explainability and interpretability of results by examining the specific effects of deep features on both other DL approaches and more traditional machine learning (ML) methods. However, it often underperforms compared to E2E and TL approaches.

9/9/2024

🔎

Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis

Md Saiful Islam, Tariq Adnan, Jan Freyberg, Sangwu Lee, Abdelrahman Abdelkader, Meghan Pawlik, Cathe Schwartz, Karen Jaffe, Ruth B. Schneider, E Ray Dorsey, Ehsan Hoque

Limited access to neurological care leads to missed diagnoses of Parkinson's disease (PD), leaving many individuals unidentified and untreated. We trained a novel neural network-based fusion architecture to detect Parkinson's disease (PD) by analyzing features extracted from webcam recordings of three tasks: finger tapping, facial expression (smiling), and speech (uttering a sentence containing all letters of the alphabet). Additionally, the model incorporated Monte Carlo Dropout to improve prediction accuracy by considering uncertainties. The study participants (n = 845, 272 with PD) were randomly split into three sets: 60% for training, 20% for model selection (hyper-parameter tuning), and 20% for final performance evaluation. The dataset consists of 1102 sessions, each session containing videos of all three tasks. Our proposed model achieved significantly better accuracy, area under the ROC curve (AUROC), and sensitivity at non-inferior specificity compared to any single-task model. Withholding uncertain predictions further boosted the performance, achieving 88.0% (95% CI: 87.7% - 88.4%) accuracy, 93.0% (92.8% - 93.2%) AUROC, 79.3% (78.4% - 80.2%) sensitivity, and 92.6% (92.3% - 92.8%) specificity, at the expense of not being able to predict for 2.3% (2.0% - 2.6%) data. Further analysis suggests that the trained model does not exhibit any detectable bias across sex and ethnic subgroups and is most effective for individuals aged between 50 and 80. This accessible, low-cost approach requiring only an internet-enabled device with a webcam and microphone paves the way for convenient PD screening at home, particularly in regions with limited access to clinical specialists.

6/24/2024

Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions

Moreno La Quatra, Maria Francesca Turco, Torbj{o}rn Svendsen, Giampiero Salvi, Juan Rafael Orozco-Arroyave, Sabato Marco Siniscalchi

This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundational-based models on the standard PC-GITA (s-PC-GITA) clean data. Our results demonstrate superior performance to previously proposed models. Second, we assess the generalization capability of the PD models on the extended PC-GITA (e-PC-GITA) recordings, collected in real-world operative conditions, and observe a severe drop in performance moving from ideal to real-world conditions. Third, we align training and testing conditions applaying off-the-shelf SE techniques on e-PC-GITA, and a significant boost in performance is observed only for the foundational-based models. Finally, combining the two best foundational-based models trained on s-PC-GITA, namely WavLM Base and Hubert Base, yielded top performance on the enhanced e-PC-GITA.

6/25/2024