Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data

Read original: arXiv:2406.13579 - Published 6/21/2024 by Michael Doell, Dominik Kuehn, Vanessa Suessle, Matthew J. Burnett, Colleen T. Downs, Andreas Weinmann, Elke Hergenroether

📊

Overview

Biodiversity monitoring using passive acoustic monitoring (PAM) recordings is time-consuming and challenged by background noise
Existing sound event detection (SED) models only work on certain avian species, and developing new models requires labeled data
The developed framework automatically extracts labeled data from available platforms for selected avian species
The labeled data is embedded into recordings with environmental sounds and noise, and used to train convolutional recurrent neural network (CRNN) models
The models are evaluated on unprocessed real-world data recorded in urban KwaZulu-Natal habitats

Plain English Explanation

Monitoring biodiversity using recordings of animal sounds, known as passive acoustic monitoring (PAM), can be a valuable tool for conservation efforts. However, analyzing these recordings is a time-consuming process, and the presence of background noise makes it challenging. Existing models for detecting specific animal sounds, called sound event detection (SED) models, have only worked with certain bird species, and developing new models for other species requires collecting labeled data, which can be difficult.

The researchers developed a framework that can automatically extract labeled data for selected bird species from available online platforms. They then embedded this labeled data into recordings that also include environmental sounds and background noise, and used it to train a type of machine learning model called a convolutional recurrent neural network (CRNN). These trained models were then evaluated on real-world, unprocessed recordings made in urban areas of KwaZulu-Natal, South Africa.

The Adapted SED-CRNN model achieved an impressive F1 score of 0.73, demonstrating its ability to effectively detect bird sounds even in noisy, real-world conditions. This approach of automatically extracting labeled data for specific species enables the easy adaptation of PAM to other species and habitats, which could be valuable for future conservation projects.

Technical Explanation

The researchers developed a framework to automatically extract labeled data for selected avian species from available online platforms, such as Xeno-Canto and AnimálVoz. This labeled data was then embedded into recordings that included environmental sounds and background noise, and used to train convolutional recurrent neural network (CRNN) models for sound event detection (SED).

The trained models were evaluated on unprocessed real-world data recorded in urban KwaZulu-Natal habitats. The Adapted SED-CRNN model achieved an F1 score of 0.73, demonstrating its effectiveness in detecting bird sounds even in noisy, real-world conditions.

Critical Analysis

The researchers acknowledge that their approach relies on the availability of labeled data from existing platforms, which may not be comprehensive for all species and habitats. Additionally, the evaluation was limited to urban KwaZulu-Natal habitats, and the performance of the models in other environments or with different species may vary.

While the results are promising, further research is needed to address the scalability of the approach, as well as its robustness to rare events and ability to generalize to new species and habitats. Additionally, the impact of the automatically extracted labeled data on model performance compared to manually curated datasets could be further explored.

Conclusion

The developed framework for automatically extracting labeled data for selected avian species and using it to train CRNN models for sound event detection demonstrates a promising approach to biodiversity monitoring using passive acoustic monitoring. The high performance of the Adapted SED-CRNN model on real-world, unprocessed data suggests that this technique could be valuable for future conservation efforts, enabling the easy adaptation of PAM to a variety of species and habitats.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data

Michael Doell, Dominik Kuehn, Vanessa Suessle, Matthew J. Burnett, Colleen T. Downs, Andreas Weinmann, Elke Hergenroether

Analyses for biodiversity monitoring based on passive acoustic monitoring (PAM) recordings is time-consuming and challenged by the presence of background noise in recordings. Existing models for sound event detection (SED) worked only on certain avian species and the development of further models required labeled data. The developed framework automatically extracted labeled data from available platforms for selected avian species. The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models. The models were evaluated on unprocessed real world data recorded in urban KwaZulu-Natal habitats. The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions. The proposed approach to automatically extract labeled data for chosen avian species enables an easy adaption of PAM to other species and habitats for future conservation projects.

6/21/2024

Towards Deep Active Learning in Avian Bioacoustics

Lukas Rauch, Denis Huseljic, Moritz Wirth, Jens Decke, Bernhard Sick, Christoph Scholz

Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-effective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive efforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.

6/28/2024

🔎

New!Domain-Invariant Representation Learning of Bird Sounds

Ilyass Moummad, Romain Serizel, Emmanouil Benetos, Nicolas Farrugia

Passive acoustic monitoring (PAM) is crucial for bioacoustic research, enabling non-invasive species tracking and biodiversity monitoring. Citizen science platforms like Xeno-Canto provide large annotated datasets from focal recordings, where the target species is intentionally recorded. However, PAM requires monitoring in passive soundscapes, creating a domain shift between focal and passive recordings, which challenges deep learning models trained on focal recordings. To address this, we leverage supervised contrastive learning to improve domain generalization in bird sound classification, enforcing domain invariance across same-class examples from different domains. We also propose ProtoCLR (Prototypical Contrastive Learning of Representations), which reduces the computational complexity of the SupCon loss by comparing examples to class prototypes instead of pairwise comparisons. Additionally, we present a new few-shot classification benchmark based on BirdSet, a large-scale bird sound dataset, and demonstrate the effectiveness of our approach in achieving strong transfer performance.

9/16/2024

Advanced Framework for Animal Sound Classification With Features Optimization

Qiang Yang, Xiuying Chen, Changsheng Ma, Carlos M. Duarte, Xiangliang Zhang

The automatic classification of animal sounds presents an enduring challenge in bioacoustics, owing to the diverse statistical properties of sound signals, variations in recording equipment, and prevalent low Signal-to-Noise Ratio (SNR) conditions. Deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) have excelled in human speech recognition but have not been effectively tailored to the intricate nature of animal sounds, which exhibit substantial diversity even within the same domain. We propose an automated classification framework applicable to general animal sound classification. Our approach first optimizes audio features from Mel-frequency cepstral coefficients (MFCC) including feature rearrangement and feature reduction. It then uses the optimized features for the deep learning model, i.e., an attention-based Bidirectional LSTM (Bi-LSTM), to extract deep semantic features for sound classification. We also contribute an animal sound benchmark dataset encompassing oceanic animals and birds1. Extensive experimentation with real-world datasets demonstrates that our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy, promising advancements in animal sound classification.

7/8/2024