BirdSet: A Multi-Task Benchmark for Classification in Computational Avian Bioacoustics

Read original: arXiv:2403.10380 - Published 6/18/2024 by Lukas Rauch, Raphael Schwinger, Moritz Wirth, Ren'e Heinrich, Denis Huseljic, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz

BirdSet: A Multi-Task Benchmark for Classification in Computational Avian Bioacoustics

Overview

Presents BirdSet, a new multi-task benchmark for classification in avian bioacoustics
Aims to address limitations in existing benchmarks and support more holistic evaluation of audio classification models
Includes tasks for species identification, vocalisation type classification, and sound event detection

Plain English Explanation

BirdSet is a new benchmark designed to evaluate how well AI audio classification models perform on a variety of tasks related to bird sounds. The goal is to create a more comprehensive way to assess these models, rather than just focusing on a single task like identifying bird species.

The benchmark includes three main tasks:

Classifying the species of a bird based on its vocalizations
Determining the type of vocalization (e.g. song, call, etc.)
Detecting the presence of different sound events in a recording

By having multiple tasks, the benchmark aims to better reflect the real-world challenges faced when working with bird audio data, which often requires models to handle a range of subtleties and complexities. This could help drive the development of more robust and versatile audio classification systems.

Technical Explanation

The BirdSet benchmark is composed of several datasets that collectively cover a diverse set of bird species, vocalizations, and sound events. The datasets were curated from multiple existing sources and annotated to enable the three primary tasks.

For the species classification task, the benchmark provides audio recordings of individual birds along with their species labels. The vocalisation type classification task uses the same audio but adds annotations for the type of vocalization (e.g. song, call, etc.). Finally, the sound event detection task operates on longer, multi-species recordings and requires models to identify the presence and timing of different sound events.

By evaluating models on this suite of interrelated tasks, the benchmark aims to provide a more holistic assessment of their capabilities. This contrasts with many existing avian bioacoustics benchmarks that tend to focus on a single task, such as just species identification.

Critical Analysis

The BirdSet benchmark represents a valuable step forward in evaluating audio classification models for real-world applications. However, the authors acknowledge several limitations and areas for further research.

For example, the dataset is still relatively small compared to the vast diversity of bird species and vocalizations worldwide. Expanding the dataset coverage, particularly for less common species, could help make the benchmark more representative.

Additionally, the authors note that the sound event detection task may be challenging due to the complex acoustic environments and overlapping sounds present in the recordings. Developing more sophisticated event detection methods could be an interesting area for future work.

Lastly, while the multi-task approach is a strength of BirdSet, further research is needed to understand how models can effectively leverage the relationships between the different tasks to improve overall performance.

Conclusion

Overall, the BirdSet benchmark provides a new, more holistic way to evaluate audio classification models in the context of avian bioacoustics. By incorporating multiple interrelated tasks, the benchmark aims to drive the development of more robust and versatile systems that can handle the complexities of real-world bird sound data.

While the benchmark has some limitations, it represents an important step forward in benchmarking research. As the field continues to evolve, benchmarks like BirdSet will play a crucial role in guiding the development of advanced audio classification technologies with broad applications in ecology, conservation, and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BirdSet: A Multi-Task Benchmark for Classification in Computational Avian Bioacoustics

Lukas Rauch, Raphael Schwinger, Moritz Wirth, Ren'e Heinrich, Denis Huseljic, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz

Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), DL models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a comprehensive evaluation of generalization performance. Therefore, we introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing. Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.

6/18/2024

🏷️

Exploring Meta Information for Audio-based Zero-shot Bird Classification

Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Bjorn W. Schuller

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich and diverse meta-data. We investigate three different sources of metadata: textual bird sound descriptions encoded via (S)BERT, functional traits (AVONET), and bird life-history (BLH) characteristics. As audio features, we extract audio spectrogram transformer (AST) embeddings and project them to the dimension of the auxiliary information by adopting a single linear layer. Then, we employ the dot product as compatibility function and a standard zero-shot learning ranking hinge loss to determine the correct class. The best results are achieved by concatenating the AVONET and BLH features attaining a mean unweighted F1-score of .233 over five different test sets with 8 to 10 classes.

6/12/2024

Transfer Learning with Pseudo Multi-Label Birdcall Classification for DS@GT BirdCLEF 2024

Anthony Miyaguchi, Adrian Cheung, Murilo Gustineli, Ashley Kim

We present working notes for the DS@GT team on transfer learning with pseudo multi-label birdcall classification for the BirdCLEF 2024 competition, focused on identifying Indian bird species in recorded soundscapes. Our approach utilizes production-grade models such as the Google Bird Vocalization Classifier, BirdNET, and EnCodec to address representation and labeling challenges in the competition. We explore the distributional shift between this year's edition of unlabeled soundscapes representative of the hidden test set and propose a pseudo multi-label classification strategy to leverage the unlabeled data. Our highest post-competition public leaderboard score is 0.63 using BirdNET embeddings with Bird Vocalization pseudo-labels. Our code is available at https://github.com/dsgt-kaggle-clef/birdclef-2024

7/10/2024

TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors

Zhaolan Huang, Adrien Tousnakhoff, Polina Kozyr, Roman Rehausen, Felix Bie{ss}mann, Robert Lachlan, Cedric Adjih, Emmanuel Baccelli

Monitoring biodiversity at scale is challenging. Detecting and identifying species in fine grained taxonomies requires highly accurate machine learning (ML) methods. Training such models requires large high quality data sets. And deploying these models to low power devices requires novel compression techniques and model architectures. While species classification methods have profited from novel data sets and advances in ML methods, in particular neural networks, deploying these state of the art models to low power devices remains difficult. Here we present a comprehensive empirical comparison of various tinyML neural network architectures and compression techniques for species classification. We focus on the example of bird song detection, more concretely a data set curated for studying the corn bunting bird species. The data set is released along with all code and experiments of this study. In our experiments we compare predictive performance, memory and time complexity of classical spectrogram based methods and recent approaches operating on raw audio signal. Our results indicate that individual bird species can be robustly detected with relatively simple architectures that can be readily deployed to low power devices.

9/12/2024