Feature Representations for Automatic Meerkat Vocalization Classification

Read original: arXiv:2408.15296 - Published 8/29/2024 by Imen Ben Mahmoud, Eklavya Sarkar, Marta Manser, Mathew Magimai. -Doss

Feature Representations for Automatic Meerkat Vocalization Classification

Overview

This paper explores different feature representations for automatically classifying meerkat vocalizations.
Meerkats are small mammals that live in social groups and use a variety of vocal signals to communicate.
Automatically recognizing and classifying meerkat vocalizations can provide insights into their social behavior and ecology.

Plain English Explanation

The paper looks at ways to automatically classify the different sounds that meerkats make. Meerkats are small animals that live together in groups and use a variety of vocalizations to communicate with each other. Being able to recognize and categorize these meerkat calls can help researchers better understand the animals' social behavior and the way they interact.

The researchers tested different feature representations - which are mathematical descriptions of the acoustic properties of the sounds - to see which ones work best for automatically classifying meerkat vocalizations. They compared things like spectrograms, which show the frequency content of sounds over time, and mel-frequency cepstral coefficients, which capture the overall shape of the sound spectrum.

Technical Explanation

The paper evaluates different feature representations for automatically classifying meerkat vocalizations. The researchers collected audio recordings of meerkats and manually labeled the different call types, then tested the performance of various acoustic feature representations for automatically categorizing the calls.

They compared the classification accuracy of spectrograms, mel-frequency cepstral coefficients (MFCCs), and continuous wavelet transforms (CWTs) as feature representations. The results showed that MFCCs and CWTs outperformed spectrograms for this task.

The authors also explored feature learning using convolutional neural networks to automatically extract useful features from the raw audio data, which further improved classification performance compared to the handcrafted features.

Critical Analysis

The paper provides a thorough evaluation of different acoustic feature representations for classifying meerkat vocalizations. The authors acknowledge that their dataset is relatively small, which could limit the generalizability of their findings. Collecting and annotating more audio data would help validate the results.

Additionally, while the paper focuses on meerkats, the feature representation techniques could potentially be applied to automated classification of other animal vocalizations. Further research is needed to explore the broader applicability of the methods.

Overall, this work contributes to the growing field of bioacoustics and animal behavior monitoring using machine learning techniques, which could have important implications for conservation and ecological research.

Conclusion

This paper investigates different acoustic feature representations for automatically classifying meerkat vocalizations. The results suggest that MFCCs and CWTs outperform spectrograms for this task, and that feature learning can further improve classification accuracy. This work contributes to the growing field of automated bioacoustic monitoring and could have applications in areas like animal behavior research and conservation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Feature Representations for Automatic Meerkat Vocalization Classification

Imen Ben Mahmoud, Eklavya Sarkar, Marta Manser, Mathew Magimai. -Doss

Understanding evolution of vocal communication in social animals is an important research problem. In that context, beyond humans, there is an interest in analyzing vocalizations of other social animals such as, meerkats, marmosets, apes. While existing approaches address vocalizations of certain species, a reliable method tailored for meerkat calls is lacking. To that extent, this paper investigates feature representations for automatic meerkat vocalization analysis. Both traditional signal processing-based representations and data-driven representations facilitated by advances in deep learning are explored. Call type classification studies conducted on two data sets reveal that feature extraction methods developed for human speech processing can be effectively employed for automatic meerkat call analysis.

8/29/2024

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis

Eklavya Sarkar, Mathew Magimai. -Doss

Marmoset monkeys encode vital information in their calls and serve as a surrogate model for neuro-biologists to understand the evolutionary origins of human vocal communication. Traditionally analyzed with signal processing-based features, recent approaches have utilized self-supervised models pre-trained on human speech for feature extraction, capitalizing on their ability to learn a signal's intrinsic structure independently of its acoustic domain. However, the utility of such foundation models remains unclear for marmoset call analysis in terms of multi-class classification, bandwidth, and pre-training domain. This study assesses feature representations derived from speech and general audio domains, across pre-training bandwidths of 4, 8, and 16 kHz for marmoset call-type and caller classification tasks. Results show that models with higher bandwidth improve performance, and pre-training on speech or general audio yields comparable results, improving over a spectral baseline.

7/25/2024

Advanced Framework for Animal Sound Classification With Features Optimization

Qiang Yang, Xiuying Chen, Changsheng Ma, Carlos M. Duarte, Xiangliang Zhang

The automatic classification of animal sounds presents an enduring challenge in bioacoustics, owing to the diverse statistical properties of sound signals, variations in recording equipment, and prevalent low Signal-to-Noise Ratio (SNR) conditions. Deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) have excelled in human speech recognition but have not been effectively tailored to the intricate nature of animal sounds, which exhibit substantial diversity even within the same domain. We propose an automated classification framework applicable to general animal sound classification. Our approach first optimizes audio features from Mel-frequency cepstral coefficients (MFCC) including feature rearrangement and feature reduction. It then uses the optimized features for the deep learning model, i.e., an attention-based Bidirectional LSTM (Bi-LSTM), to extract deep semantic features for sound classification. We also contribute an animal sound benchmark dataset encompassing oceanic animals and birds1. Extensive experimentation with real-world datasets demonstrates that our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy, promising advancements in animal sound classification.

7/8/2024

On Feature Learning for Titi Monkey Activity Detection

Aditya Ravuri, Jen Muir, Neil D. Lawrence

This paper, a technical summary of our preceding publication, introduces a robust machine learning framework for the detection of vocal activities of Coppery titi monkeys. Utilizing a combination of MFCC features and a bidirectional LSTM-based classifier, we effectively address the challenges posed by the small amount of expert-annotated vocal data available. Our approach significantly reduces false positives and improves the accuracy of call detection in bioacoustic research. Initial results demonstrate an accuracy of 95% on instance predictions, highlighting the effectiveness of our model in identifying and classifying complex vocal patterns in environmental audio recordings. Moreover, we show how call classification can be done downstream, paving the way for real-world monitoring.

7/2/2024