Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

Read original: arXiv:2307.09230 - Published 7/25/2024 by Mary Paterson, James Moor, Luisa Cutillo

🗣️

Overview

Throat cancer cases are increasing globally, and early detection is crucial for improving survival rates.
Artificial intelligence (AI) and machine learning (ML) have the potential to detect throat cancer from patient speech, enabling earlier diagnosis and reducing healthcare system burdens.
However, no comprehensive review has explored the use of AI and ML for detecting throat cancer from speech.
This review aims to evaluate the performance of these technologies and identify issues that require further research.

Plain English Explanation

This research paper examines how artificial intelligence (AI) and machine learning (ML) can be used to detect throat cancer from a patient's speech. The number of throat cancer cases is increasing around the world, and catching the disease early is crucial for improving a person's chances of survival.

The researchers wanted to see how well AI and ML technologies could identify throat cancer by analyzing a patient's voice. This could lead to earlier diagnoses and reduce the strain on healthcare systems. However, no previous studies had thoroughly reviewed the research in this area.

The researchers reviewed studies from three different scientific databases to find articles that used machine learning to classify speech data, and specifically included throat cancer patients in their data. They looked at how the studies performed in terms of binary classification (healthy vs. cancer) and multi-class classification (different types of cancer).

The findings showed that there is no single method or feature that consistently outperforms others in detecting throat cancer from speech. The researchers also noted a lack of open-source code and data, which is important for other researchers to validate and build upon the results.

Technical Explanation

The researchers conducted a scoping literature review across three major databases (Scopus, Web of Science, and PubMed) to identify studies that used machine learning to classify speech data and included throat cancer patients.

They found 27 relevant articles, 12 of which performed binary classification (healthy vs. throat cancer), 13 that performed multi-class classification (differentiating between various types of throat cancer), and 2 that did both. The most common machine learning method used was neural networks, and the most frequently extracted speech feature was mel-spectrograms.

The researchers also documented the pre-processing methods and classifier performance reported in the studies. When they compared the articles against the TRIPOD-AI checklist, which evaluates the reporting quality of AI studies, they found a significant lack of open science, with only one article sharing code and three using open-access data.

Critical Analysis

The review highlights the need for more standardized methodologies and improved reproducibility in this field of research. The lack of open-source code and data is a significant limitation, as it prevents external validation and further development of the techniques.

Additionally, the review did not identify a single method or speech feature that consistently outperforms others in detecting throat cancer. This suggests that more research is needed to determine the most effective approach and to understand the specific speech characteristics that differentiate healthy individuals from those with throat cancer.

The researchers also note that future studies should focus on improving the generalizability of the models, as many of the current studies have been conducted on relatively small and homogeneous datasets. Larger, more diverse datasets would help ensure the robustness and real-world applicability of the AI and ML-based throat cancer detection methods.

Conclusion

This review highlights the potential of AI and ML to facilitate earlier diagnosis of throat cancer by analyzing patient speech. However, the field currently lacks standardized methodologies, open-source code and data, and a clear consensus on the most effective approach.

Future research should prioritize improving the reproducibility and generalizability of the techniques, as well as exploring the specific speech characteristics that are most indicative of throat cancer. By addressing these gaps, the research community can work towards developing AI and ML-based tools that can significantly improve early detection and, ultimately, patient outcomes for those with throat cancer.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🗣️

Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

Mary Paterson, James Moor, Luisa Cutillo

Introduction: Cases of throat cancer are rising worldwide. With survival decreasing significantly at later stages, early detection is vital. Artificial intelligence (AI) and machine learning (ML) have the potential to detect throat cancer from patient speech, facilitating earlier diagnosis and reducing the burden on overstretched healthcare systems. However, no comprehensive review has explored the use of AI and ML for detecting throat cancer from speech. This review aims to fill this gap by evaluating how these technologies perform and identifying issues that need to be addressed in future research. Materials and Methods: We conducted a scoping literature review across three databases: Scopus,Web of Science, and PubMed. We included articles that classified speech using machine learning and specified the inclusion of throat cancer patients in their data. Articles were categorized based on whether they performed binary or multi-class classification. Results: We found 27 articles fitting our inclusion criteria, 12 performing binary classification, 13 performing multi-class classification, and two that do both binary and multiclass classification. The most common classification method used was neural networks, and the most frequently extracted feature was mel-spectrograms. We also documented pre-processing methods and classifier performance. We compared each article against the TRIPOD-AI checklist, which showed a significant lack of open science, with only one article sharing code and only three using open-access data. Conclusion: Open-source code is essential for external validation and further development in this field. Our review indicates that no single method or specific feature consistently outperforms others in detecting throat cancer from speech. Future research should focus on standardizing methodologies and improving the reproducibility of results.

7/25/2024

🖼️

Cervical Auscultation Machine Learning for Dysphagia Assessment

An An Chia, Stacy Lum, Michelle Boo, Rex Tan, Balamurali B T, Jer-Ming Chen

This study evaluates the use of machine learning, specifically the Random Forest Classifier, to differentiate normal and pathological swallowing sounds. Employing a commercially available wearable stethoscope, we recorded swallows from both healthy adults and patients with dysphagia. The analysis revealed statistically significant differences in acoustic features, such as spectral crest, and zero-crossing rate between normal and pathological swallows, while no discriminating differences were demonstrated between different fluidand diet consistencies. The system demonstrated fair sensitivity (mean plus or minus SD: 74% plus or minus 8%) and specificity (89% plus or minus 6%) for dysphagic swallows. The model attained an overall accuracy of 83% plus or minus 3%, and F1 score of 78% plus or minus 5%. These results demonstrate that machine learning can be a valuable tool in non-invasive dysphagia assessment, although challenges such as sampling rate limitations and variability in sensitivity and specificity in discriminating between normal and pathological sounds are noted. The study underscores the need for further research to optimize these techniques for clinical use.

7/9/2024

🔎

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

Asmaa Shati, Ghulam Mubashar Hassan, Amitava Datta

A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Despite this, recent advancements, such as cough audio recordings, have emerged as a means to automate the detection of respiratory conditions. Therefore, this research aims to explore various acoustic features that enhance the performance of machine learning (ML) models in detecting COVID-19 from cough signals. It investigates the efficacy of three feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features, when applied to two machine learning algorithms, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), and therefore proposes an efficient CovCepNet detection system. The proposed system provides a practical solution and demonstrates state-of-the-art classification performance, with an AUC of 0.843 on the COUGHVID dataset and 0.953 on the Virufy dataset for COVID-19 detection from cough audio signals.

6/21/2024

🗣️

Exploring Speech Pattern Disorders in Autism using Machine Learning

Chuanbo Hu, Jacob Thrasher, Wenqi Li, Mindi Ruan, Xiangxu Yu, Lynn K Paul, Shuo Wang, Xin Li

Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals. This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues. Utilizing a dataset of recorded dialogues, we extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs), and balance. These features encompass various aspects of speech such as intonation, volume, rhythm, and speech rate, reflecting the complex nature of communicative behaviors in ASD. We employed machine learning for both classification and regression tasks to analyze these speech features. The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%. Regression models were developed to predict speech pattern related variables and a composite score from all variables, facilitating a deeper understanding of the speech dynamics associated with ASD. The effectiveness of machine learning in interpreting intricate speech patterns and the high classification accuracy underscore the potential of computational methods in supporting the diagnostic processes for ASD. This approach not only aids in early detection but also contributes to personalized treatment planning by providing insights into the speech and communication profiles of individuals with ASD.

5/9/2024