Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Read original: arXiv:2405.17569 - Published 5/29/2024 by Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo C^andido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino and 2 others

Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Overview

This paper investigates the use of deep learning techniques for detecting respiratory insufficiency in Brazilian Portuguese audio data.
The research was supported by grants from FAPESP and CAPES, and carried out at the Center for Artificial Intelligence (C4AI-USP) with IBM's support.
The study explores discriminant audio properties that can be used to identify respiratory insufficiency, a condition where the lungs are unable to provide adequate oxygen to the body.

Plain English Explanation

This research paper explores how artificial intelligence (AI) can be used to detect respiratory insufficiency, a health condition where the lungs are not able to provide enough oxygen to the body, based on audio recordings in Brazilian Portuguese. The researchers used advanced machine learning techniques, known as deep learning, to analyze the audio data and identify patterns that are indicative of respiratory insufficiency.

The work was supported by funding from the São Paulo Research Foundation (FAPESP) and the Coordination for the Improvement of Higher Education Personnel (CAPES) in Brazil, and was carried out at the Center for Artificial Intelligence (C4AI-USP) with the help of IBM. The researchers were interested in finding specific audio features or characteristics that could be used to reliably detect respiratory insufficiency, which is an important medical condition that needs to be identified and treated early.

By using deep learning algorithms to analyze the audio data, the researchers hoped to develop a more accurate and automated way of screening for respiratory insufficiency, which could potentially lead to earlier diagnosis and better outcomes for patients. The findings of this study could have important implications for the development of voice-based health monitoring systems and respiratory sound analysis tools that can assist healthcare professionals in the diagnosis and management of respiratory conditions.

Technical Explanation

The researchers in this study utilized deep learning techniques to investigate discriminant audio properties that could be used to detect respiratory insufficiency in Brazilian Portuguese audio data. They employed a pre-trained multi-modal architecture called RENE, which was specifically designed for the analysis of respiratory sounds.

The experiment design involved collecting audio recordings from both healthy individuals and those with respiratory insufficiency. The researchers then preprocessed the data, extracting various audio features such as spectrograms and Mel-frequency cepstral coefficients (MFCCs). These features were used as inputs to the RENE model, which was trained to classify the audio samples as either healthy or indicative of respiratory insufficiency.

To improve the model's performance, the researchers also explored data augmentation techniques that introduced controlled distortions to the audio samples during training. This helped the model learn more robust representations and generalize better to unseen data.

Through their analysis, the researchers identified several discriminant audio properties that were particularly useful for distinguishing between healthy and respiratory insufficiency samples. These included spectral and temporal features, as well as patterns in the energy distribution and modulation of the audio signals.

The findings of this study suggest that deep learning-based approaches can be effective in detecting respiratory insufficiency from audio data, and that the identified discriminant audio properties may be valuable for the development of advanced respiratory sound analysis systems and multimodal health monitoring frameworks.

Critical Analysis

The researchers in this study have made a valuable contribution to the field of respiratory sound analysis and its applications in healthcare. By leveraging deep learning techniques and exploring discriminant audio properties, they have demonstrated the potential of using AI-powered tools for the early detection of respiratory insufficiency, a condition that can have serious consequences if left undiagnosed.

One of the strengths of this research is the use of a pre-trained, multi-modal architecture (RENE) that was specifically designed for respiratory sound analysis. This approach leverages the knowledge and insights gained from previous work in the field, which can help improve the model's performance and generalizability.

However, the study also has some limitations. The audio data used was limited to Brazilian Portuguese, and it's unclear how well the findings would translate to other language or cultural contexts. Additionally, the study did not provide a detailed analysis of the performance of the deep learning model on real-world, clinical data, which is essential for assessing the practical applicability of the approach.

Further research could explore the robustness of the identified discriminant audio properties across diverse populations and settings, as well as investigate the integration of these techniques into multimodal health monitoring systems that combine audio data with other clinical information. Additionally, the potential for deepfake speech generation to impact the reliability of audio-based respiratory monitoring systems should be considered and addressed.

Overall, this study represents an important step forward in the application of deep learning to respiratory sound analysis and the early detection of respiratory insufficiency. The findings could have significant implications for the development of more accurate and accessible healthcare technologies, but continued research and validation will be necessary to fully realize the potential of this approach.

Conclusion

This research paper explores the use of deep learning techniques to detect respiratory insufficiency, a condition where the lungs are unable to provide adequate oxygen to the body, based on audio recordings in Brazilian Portuguese. The study was supported by grants from FAPESP and CAPES, and carried out at the Center for Artificial Intelligence (C4AI-USP) with the help of IBM.

The researchers used a pre-trained, multi-modal deep learning architecture called RENE to analyze the audio data and identify discriminant audio properties that could be used to distinguish between healthy individuals and those with respiratory insufficiency. Through their analysis, they were able to identify several spectral, temporal, and energy-related features that were particularly useful for this task.

The findings of this study have important implications for the development of advanced respiratory sound analysis systems and multimodal health monitoring frameworks that can assist healthcare professionals in the early detection and management of respiratory conditions. By leveraging AI-powered tools, the researchers have demonstrated the potential to improve the accuracy and accessibility of respiratory screening and diagnosis, which could lead to better outcomes for patients.

While the study has some limitations, such as the focus on Brazilian Portuguese audio data, the research represents an important step forward in the application of deep learning to respiratory sound analysis. Continued research and validation will be necessary to fully realize the potential of this approach, but the insights gained from this work could have far-reaching impacts on the field of healthcare technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo C^andido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $96.5%$ accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.

5/29/2024

Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation

Marcelo Matheus Gauy, Natalia Hitomi Koza, Ricardo Mikio Morita, Gabriel Rocha Stanzione, Arnaldo Candido Junior, Larissa Cristina Berti, Anna Sara Shafferman Levin, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO2) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO2 levels, commonly defined as the threshold SpO2 <92%. While SpO2 serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO2 levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO2 levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO2-regression into a SpO2-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO2 levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.

7/31/2024

🛸

Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio samples, targeting disease detection, sound pattern classification, and event identification. Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds, augmented with patient medical records. The resulting multi-modal deep-learning framework addresses interpretability and real-time diagnostic challenges that have hindered previous respiratory-focused models. Benchmark comparisons reveal that Rene significantly outperforms existing models, achieving improvements of 10.27%, 16.15%, 15.29%, and 18.90% in respiratory event detection and audio classification on the SPRSound database. Disease prediction accuracy on the ICBHI database improved by 23% over the baseline in both mean average and harmonic scores. Moreover, we have developed a real-time respiratory sound discrimination system utilizing the Rene architecture. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation(https://github.com/zpforlove/Rene).

6/10/2024

🔎

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

Asmaa Shati, Ghulam Mubashar Hassan, Amitava Datta

A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Despite this, recent advancements, such as cough audio recordings, have emerged as a means to automate the detection of respiratory conditions. Therefore, this research aims to explore various acoustic features that enhance the performance of machine learning (ML) models in detecting COVID-19 from cough signals. It investigates the efficacy of three feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features, when applied to two machine learning algorithms, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), and therefore proposes an efficient CovCepNet detection system. The proposed system provides a practical solution and demonstrates state-of-the-art classification performance, with an AUC of 0.843 on the COUGHVID dataset and 0.953 on the Virufy dataset for COVID-19 detection from cough audio signals.

6/21/2024