Optimising MFCC parameters for the automatic detection of respiratory diseases

Read original: arXiv:2408.07522 - Published 8/15/2024 by Yuyang Yan, Sami O. Simons, Loes van Bemmel, Lauren Reinders, Frits M. E. Franssen, Visara Urovi

Optimising MFCC parameters for the automatic detection of respiratory diseases

Overview

Examines optimization of Mel-frequency cepstral coefficients (MFCC) parameters for automatic detection of respiratory diseases
Investigates impact of MFCC parameters on accuracy of respiratory disease classification
Aims to identify optimal MFCC parameter settings for best respiratory disease detection performance

Plain English Explanation

The paper focuses on optimizing the parameters used to extract Mel-frequency cepstral coefficients (MFCC) from audio signals, with the goal of improving the automatic detection of respiratory diseases. MFCC is a widely used feature in audio processing and analysis, and the authors investigate how adjusting the MFCC parameters can impact the accuracy of classifying respiratory conditions.

The key idea is that by finding the optimal MFCC parameter settings, the researchers can enhance the performance of machine learning models in detecting respiratory diseases from audio data, such as coughing or breathing sounds. This could lead to more accurate and reliable diagnosis tools for healthcare professionals.

Technical Explanation

The paper evaluates the impact of different MFCC parameter settings on the performance of respiratory disease classification. MFCC features are extracted from audio samples, and machine learning models are trained to detect the presence of various respiratory conditions.

The authors systematically test different values for MFCC parameters like the number of mel-frequency bands, the number of cepstral coefficients, and the frame size and frame shift. They then analyze how these parameter changes affect the accuracy, sensitivity, and specificity of the respiratory disease detection models.

The results indicate that the optimal MFCC parameter settings can vary depending on the specific respiratory condition being classified. For example, the best configuration for detecting asthma may differ from the optimal settings for detecting pneumonia. By identifying the parameter values that yield the highest classification performance, the researchers provide guidance on how to configure MFCC extraction for effective respiratory disease detection.

Critical Analysis

The paper provides a comprehensive investigation of MFCC parameter optimization for respiratory disease classification, but it does acknowledge some limitations:

The experiments are conducted on a relatively small dataset, which may limit the generalizability of the findings. Evaluating the approach on larger, more diverse datasets would strengthen the conclusions.
The paper focuses on MFCC features, but other audio signal representations or combinations of features may also be worth exploring for respiratory disease detection.
The analysis is limited to a specific set of respiratory conditions. Expanding the scope to a wider range of diseases could further demonstrate the versatility of the optimization approach.
The paper does not delve into the underlying reasons why certain MFCC parameter settings perform better for specific respiratory conditions. Gaining a deeper understanding of the acoustic signatures and their relationship to the MFCC features could lead to additional insights.

Overall, the research provides a valuable contribution to the field of respiratory disease detection using audio analysis, but there are opportunities for further refinement and exploration.

Conclusion

This paper presents an in-depth investigation into the optimization of Mel-frequency cepstral coefficient (MFCC) parameters for the automatic detection of respiratory diseases. The findings demonstrate that adjusting the MFCC extraction parameters can have a significant impact on the performance of machine learning models in identifying respiratory conditions from audio data.

By identifying the optimal MFCC parameter settings for different respiratory diseases, the researchers provide a practical guide for developing more accurate and reliable audio-based diagnostic tools. This work has the potential to contribute to the advancement of respiratory healthcare, enabling earlier detection and more effective treatment of various respiratory ailments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Optimising MFCC parameters for the automatic detection of respiratory diseases

Yuyang Yan, Sami O. Simons, Loes van Bemmel, Lauren Reinders, Frits M. E. Franssen, Visara Urovi

Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrucken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively.

8/15/2024

🔎

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

Asmaa Shati, Ghulam Mubashar Hassan, Amitava Datta

A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Despite this, recent advancements, such as cough audio recordings, have emerged as a means to automate the detection of respiratory conditions. Therefore, this research aims to explore various acoustic features that enhance the performance of machine learning (ML) models in detecting COVID-19 from cough signals. It investigates the efficacy of three feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features, when applied to two machine learning algorithms, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), and therefore proposes an efficient CovCepNet detection system. The proposed system provides a practical solution and demonstrates state-of-the-art classification performance, with an AUC of 0.843 on the COUGHVID dataset and 0.953 on the Virufy dataset for COVID-19 detection from cough audio signals.

6/21/2024

🏷️

Enhanced Classification of Heart Sounds Using Mel Frequency Cepstral Coefficients: A Comparative Study of Single and Ensemble Classifier Strategies

Amir Masoud Rahmani, Amir Haider, Mohammad Adeli, Olfa Mzoughi, Entesar Gemeay, Mokhtar Mohammadi, Hamid Alinejad-Rokny, Parisa Khoshvaght, Mehdi Hosseinzadeh

This paper explores the efficacy of Mel Frequency Cepstral Coefficients (MFCCs) in detecting abnormal heart sounds using two classification strategies: a single classifier and an ensemble classifier approach. Heart sounds were first pre-processed to remove noise and then segmented into S1, systole, S2, and diastole intervals, with thirteen MFCCs estimated from each segment, yielding 52 MFCCs per beat. Finally, MFCCs were used for heart sound classification. For that purpose, in the single classifier strategy, the MFCCs from nine consecutive beats were averaged to classify heart sounds by a single classifier (either a support vector machine (SVM), the k nearest neighbors (kNN), or a decision tree (DT)). Conversely, the ensemble classifier strategy employed nine classifiers (either nine SVMs, nine kNN classifiers, or nine DTs) to individually assess beats as normal or abnormal, with the overall classification based on the majority vote. Both methods were tested on a publicly available phonocardiogram database. The heart sound classification accuracy was 91.95% for the SVM, 91.9% for the kNN, and 87.33% for the DT in the single classifier strategy. Also, the accuracy was 93.59% for the SVM, 91.84% for the kNN, and 92.22% for the DT in the ensemble classifier strategy. Overall, the results demonstrated that the ensemble classifier strategy improved the accuracies of the DT and the SVM by 4.89% and 1.64%, establishing MFCCs as more effective than other features, including time, time-frequency, and statistical features, evaluated in similar studies.

7/2/2024

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

Cong Zhang, Wenxing Guo, Hongsheng Dai

This study addresses the TAUKADIAL challenge, focusing on the classification of speech from people with Mild Cognitive Impairment (MCI) and neurotypical controls. We conducted three experiments comparing five machine-learning methods: Random Forests, Sparse Logistic Regression, k-Nearest Neighbors, Sparse Support Vector Machine, and Decision Tree, utilizing 1076 acoustic features automatically extracted using openSMILE. In Experiment 1, the entire dataset was used to train a language-agnostic model. Experiment 2 introduced a language detection step, leading to separate model training for each language. Experiment 3 further enhanced the language-agnostic model from Experiment 1, with a specific focus on evaluating the robustness of the models using out-of-sample test data. Across all three experiments, results consistently favored models capable of handling high-dimensional data, such as Random Forest and Sparse Logistic Regression, in classifying speech from MCI and controls.

8/30/2024