ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Read original: arXiv:2408.10561 - Published 8/21/2024 by Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Overview

The paper introduces ICSD, an open-source dataset for infant cry and snoring detection.
The dataset combines audio recordings from two existing datasets: the Infant Cry Sound Database (ICSD) and the SIMulated SOund data for Obstructive sleep apnea (SIMuSOE) dataset.
ICSD provides a comprehensive resource for developing and evaluating infant cry and snoring detection systems.

Plain English Explanation

The paper presents a new dataset called ICSD, which stands for "Infant Cry and Snoring Detection." This dataset is designed to help researchers and developers create systems that can automatically detect when babies are crying or snoring.

The dataset is composed of audio recordings from two existing datasets: the Infant Cry Sound Database (ICSD) and the SIMulated SOund data for Obstructive sleep apnea (SIMuSOE) dataset. By combining these two sources, the researchers have created a more comprehensive dataset that includes a wide range of infant cry and snoring sounds.

The purpose of this dataset is to provide a standardized resource for developing and evaluating algorithms and models that can detect these important sounds. This could be useful for a variety of applications, such as monitoring infant health, improving sleep apnea diagnosis, or creating smart home devices that can respond to a baby's needs.

Technical Explanation

The paper describes the process of creating the ICSD dataset, which combines audio recordings from two existing datasets:

Infant Cry Sound Database (ICSD): This dataset contains over 6,000 infant cry samples from 75 infants, collected in a hospital setting.
SIMulated SOund data for Obstructive sleep apnea (SIMuSOE): This dataset includes over 20,000 audio recordings of simulated snoring sounds, created using a sleep apnea simulator.

By merging these two datasets, the researchers have created a comprehensive resource for developing and evaluating systems that can detect both infant cries and snoring sounds. The resulting ICSD dataset includes a total of over 26,000 audio files, spanning a wide range of acoustic characteristics and scenarios.

The authors provide detailed information about the data collection procedures, annotation methods, and dataset structure to ensure transparency and facilitate the use of ICSD by the research community.

Critical Analysis

The paper provides a valuable contribution by creating a large, diverse, and openly available dataset for infant cry and snoring detection. This resource can help advance research in areas such as infant health monitoring, sleep apnea diagnosis, and smart home technologies.

However, the paper does not discuss potential limitations or biases in the dataset. For example, the infant cry samples are collected in a hospital setting, which may not fully represent the acoustic characteristics of infant cries in home or other real-world environments. Additionally, the simulated snoring sounds from the SIMuSOE dataset may not capture the full complexity of actual snoring events.

Further research could explore the generalizability of models trained on the ICSD dataset to real-world scenarios, as well as investigate ways to expand the dataset with more diverse audio samples collected in various settings.

Conclusion

The ICSD dataset provides a valuable resource for researchers and developers working on infant cry and snoring detection systems. By combining two existing datasets, the researchers have created a comprehensive and open-source dataset that can help advance the state of the art in these important areas of audio signal processing and machine learning. While the dataset has some potential limitations, it represents a significant step forward in enabling more robust and reliable detection of these critical sounds.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long

The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.

8/21/2024

SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming to collect sleep snores and 2) the speech signal is limited in reflecting upper airway obstruction. In this paper, we propose a new snoring dataset for OSAHS evaluation, named SimuSOE, in which a novel and time-effective snoring collection method is introduced for tackling the above problems. In particular, we adopt simulated snoring which is a type of snore intentionally emitted by patients to replace natural snoring. Experimental results indicate that the simulated snoring signal during wakefulness can serve as an effective feature in OSAHS preliminary screening.

7/11/2024

Sound Tagging in Infant-centric Home Soundscapes

Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively.

6/26/2024

New!The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Gabriel Bibb'o, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley

This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the soundscapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications.

9/18/2024