Machine listening in a neonatal intensive care unit

Read original: arXiv:2409.11439 - Published 9/19/2024 by Modan Tailleur (LS2N, Nantes Univ - ECN, LS2N - 'equipe SIMS), Vincent Lostanlen (LS2N, LS2N - 'equipe SIMS, Nantes Univ - ECN), Jean-Philippe Rivi`ere (Nantes Univ, Nantes Univ - UFR FLCE, LS2N, LS2N - 'equipe PACCE) and 1 other

👨‍🏫

Overview

Detecting common sound sources like oxygenators, alarms, and footsteps in hospitals has scientific value for environmental psychology.
However, this task comes with challenges around privacy preservation and limited labeled data.
The paper addresses these challenges using a combination of edge computing and cloud computing.

Plain English Explanation

The researchers in this paper looked at the task of detecting common hospital sounds like oxygenators, alarm devices, and footsteps. This has value for environmental psychology - understanding how sounds impact people in hospital settings.

But there were two main challenges the researchers had to address. First, they needed to protect patient privacy, since recording audio could raise privacy concerns. Second, they had limited labeled data to train machine learning models.

To solve these issues, the researchers used a combination of edge computing (processing data locally on sensors) and cloud computing. On the sensors, they computed spectrograms (audio visualizations) instead of recording raw audio, preserving privacy. And for machine learning, they adapted a pre-trained audio neural network to work with their limited data.

This allowed the researchers to detect hospital sounds while respecting privacy and working with the constraints of their dataset.

Technical Explanation

The paper describes a system for polyphonic machine listening in a hospital ward. The key components are:

Privacy-preserving acoustic sensors: These sensors compute third-octave spectrograms on the edge instead of recording raw audio waveforms. This protects patient privacy.
Sample-efficient machine learning: The researchers repurpose a pre-trained audio neural network (PANN) via spectral transcoding and label space adaptation. This allows them to train effective models with their limited labeled dataset.

The researchers evaluated their system in a neonatological intensive care unit (NICU). They found that the detected sound events aligned with measurements from electronic badges worn by parents and staff. This demonstrates the feasibility of their privacy-preserving, sample-efficient approach to machine listening in a hospital setting.

Critical Analysis

The paper makes a compelling case for the value of automatically detecting common hospital sounds, and the researchers have designed a clever solution to address the key challenges.

However, the small-scale nature of the NICU evaluation means the findings may not generalize to larger or more diverse hospital settings. Further research would be needed to validate the approach at scale.

Additionally, while the privacy-preserving sensor design is a strength, there could be potential issues around user trust and acceptance. Hospitals and patients may be wary of any audio sensing, even if raw waveforms are not recorded.

The researchers also note that their machine learning approach relies on a pre-trained model, which may limit its flexibility and ability to adapt to new sound environments or tasks. Developing more robust self-supervised learning methods could be an area for future work.

Overall, this paper presents a promising technical solution, but additional research and real-world testing would be needed to fully assess its practicality and scalability in healthcare settings.

Conclusion

This paper introduces a system for detecting common hospital sounds like oxygenators, alarms, and footsteps, while addressing key challenges around privacy preservation and limited labeled data.

By combining edge computing and cloud-based machine learning, the researchers have demonstrated the feasibility of their approach in a neonatal intensive care unit. This could have important implications for environmental psychology research and ultimately improving hospital experiences for patients, families, and staff.

However, further validation and development would be needed to fully realize the potential of this technology in real-world healthcare settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👨‍🏫

New!Machine listening in a neonatal intensive care unit

Modan Tailleur (LS2N, Nantes Univ - ECN, LS2N - 'equipe SIMS), Vincent Lostanlen (LS2N, LS2N - 'equipe SIMS, Nantes Univ - ECN), Jean-Philippe Rivi`ere (Nantes Univ, Nantes Univ - UFR FLCE, LS2N, LS2N - 'equipe PACCE), Pierre Aumond

Oxygenators, alarm devices, and footsteps are some of the most common sound sources in a hospital. Detecting them has scientific value for environmental psychology but comes with challenges of its own: namely, privacy preservation and limited labeled data. In this paper, we address these two challenges via a combination of edge computing and cloud computing. For privacy preservation, we have designed an acoustic sensor which computes third-octave spectrograms on the fly instead of recording audio waveforms. For sample-efficient machine learning, we have repurposed a pretrained audio neural network (PANN) via spectral transcoding and label space adaptation. A small-scale study in a neonatological intensive care unit (NICU) confirms that the time series of detected events align with another modality of measurement: i.e., electronic badges for parents and healthcare professionals. Hence, this paper demonstrates the feasibility of polyphonic machine listening in a hospital ward while guaranteeing privacy by design.

9/19/2024

📊

Voice EHR: Introducing Multimodal Audio Data for Health

James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Hang Nguyen, Phuc Hong, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery, Cheran Elangovan, Balaji Krishnaiah, Lee Akst, Ioan Lina, Iqbal Elyazar, Lenny Ekwati, Stefan Jansen, Richard Nduwayezu, Charisse Garcia, Jeffrey Plum, Jacqueline Brenner, Miranda Song, Emily Ricotta, David Clifton, C. Louise Thwaites, Yael Bensoussan, Bradford Wood

Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.

6/4/2024

Sound Tagging in Infant-centric Home Soundscapes

Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively.

6/26/2024

🧠

Extracting Urban Sound Information for Residential Areas in Smart Cities Using an End-to-End IoT System

Ee-Leng Tan, Furi Andi Karnapi, Linus Junjia Ng, Kenneth Ooi, Woon-Seng Gan

With rapid urbanization comes the increase of community, construction, and transportation noise in residential areas. The conventional approach of solely relying on sound pressure level (SPL) information to decide on the noise environment and to plan out noise control and mitigation strategies is inadequate. This paper presents an end-to-end IoT system that extracts real-time urban sound metadata using edge devices, providing information on the sound type, location and duration, rate of occurrence, loudness, and azimuth of a dominant noise in nine residential areas. The collected metadata on environmental sound is transmitted to and aggregated in a cloud-based platform to produce detailed descriptive analytics and visualization. Our approach to integrating different building blocks, namely, hardware, software, cloud technologies, and signal processing algorithms to form our real-time IoT system is outlined. We demonstrate how some of the sound metadata extracted by our system are used to provide insights into the noise in residential areas. A scalable workflow to collect and prepare audio recordings from nine residential areas to construct our urban sound dataset for training and evaluating a location-agnostic model is discussed. Some practical challenges of managing and maintaining a sensor network deployed at numerous locations are also addressed.

8/13/2024