Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Read original: arXiv:2405.10018 - Published 7/19/2024 by Florian Schmid, Paul Primus, Toni Heittola, Annamaria Mesaros, Irene Mart'in-Morat'o, Khaled Koutini, Gerhard Widmer

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Overview

This paper explores low-complexity acoustic scene classification for the DCASE 2024 challenge, which focuses on developing efficient models for real-world applications.
The authors propose a data-efficient approach to improve classification performance while reducing model complexity and computational requirements.
The research aims to address the challenge of building accurate and lightweight models for acoustic scene classification, which has applications in areas like smart home devices and autonomous systems.

Plain English Explanation

In this paper, the researchers are looking at ways to classify different types of sounds or "acoustic scenes" in an efficient and low-complexity way. This is important for real-world applications like smart home devices or self-driving cars, where the models need to be accurate but also small and fast.

The researchers are working on the DCASE 2024 challenge, which is a competition focused on developing these types of efficient acoustic scene classification models. Their approach is to find ways to train the models using less data, while still maintaining high accuracy. This "data-efficient" approach could help reduce the amount of training data needed, which can be costly and time-consuming to collect.

By making the models more efficient and lightweight, the researchers hope to create solutions that can be used in practical applications where computational resources may be limited, such as on embedded devices or mobile phones. The goal is to develop accurate acoustic scene classification without requiring a lot of processing power or memory.

Technical Explanation

The paper explores low-complexity acoustic scene classification for the upcoming DCASE 2024 challenge, which focuses on developing efficient models for real-world applications. The authors propose a data-efficient low-complexity approach to improve classification performance while reducing model complexity and computational requirements.

The research aims to address the challenge of building accurate and lightweight models for acoustic scene classification, which has applications in areas like smart home devices and autonomous systems. The authors investigate techniques to train models using less data, drawing inspiration from methods like tuning and analysis of audio classifier performance in clinical settings.

The paper also explores the use of visual-audio scene classification to enhance the acoustic scene classification task. Additionally, the authors consider the impact of adversarial attacks and countermeasures on the model's robustness.

Critical Analysis

The paper acknowledges the limitations of the data-efficient approach, noting that it may not be able to achieve the same performance as models trained on larger datasets. The authors also suggest that further research is needed to explore the generalization capabilities of the proposed techniques across different datasets and real-world scenarios.

Additionally, the paper does not address the potential biases or fairness issues that may arise from the data-efficient training approach, which is an important consideration for real-world deployment of these models. Exploring the impact of dataset biases and model performance in clinical settings could provide valuable insights in this regard.

Conclusion

This paper presents a data-efficient low-complexity approach for acoustic scene classification in the DCASE 2024 challenge. The researchers aim to develop accurate and lightweight models that can be deployed in resource-constrained environments, such as smart home devices and autonomous systems.

The proposed techniques leverage methods like data-efficient training and visual-audio integration to improve classification performance while reducing model complexity. While the paper highlights the potential benefits of this approach, it also acknowledges the need for further research to address its limitations and ensure the robustness and fairness of the developed solutions.

Overall, the work contributes to the ongoing efforts to create efficient and practical acoustic scene classification models, with implications for a wide range of real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Florian Schmid, Paul Primus, Toni Heittola, Annamaria Mesaros, Irene Mart'in-Morat'o, Khaled Koutini, Gerhard Widmer

This article describes the Data-Efficient Low-Complexity Acoustic Scene Classification Task in the DCASE 2024 Challenge and the corresponding baseline system. The task setup is a continuation of previous editions (2022 and 2023), which focused on recording device mismatches and low-complexity constraints. This year's edition introduces an additional real-world problem: participants must develop data-efficient systems for five scenarios, which progressively limit the available training data. The provided baseline system is based on an efficient, factorized CNN architecture constructed from inverted residual blocks and uses Freq-MixStyle to tackle the device mismatch problem. The task received 37 submissions from 17 teams, with the large majority of systems outperforming the baseline. The top-ranked system's accuracy ranges from 54.3% on the smallest to 61.8% on the largest subset, corresponding to relative improvements of approximately 23% and 9% over the baseline system on the evaluation set.

7/19/2024

Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction

Jin Jie Sean Yeo, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan

In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels. We introduce data augmentation in the form of mixup to increase the diversity of training samples. For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz. We use Knowledge Distillation to distill the ensemble model to the baseline student model. Training the systems on the TAU Urban Acoustic Scene 2022 Mobile development dataset yielded the highest average testing accuracy of (62.21, 59.82, 56.81, 53.03, 47.97)% on split (100, 50, 25, 10, 5)% respectively over the three systems.

9/19/2024

🏷️

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextual information from each audio clip. In addition, we integrate other techniques into our method, such as knowledge distillation, data augmentation, and adaptive residual normalization. When evaluated on the official dataset of DCASE2023 challenge, our method obtains the highest accuracy of 56.10% with parameter number of 5.21 kilo and multiply-accumulate operations of 1.44 million. It exceeds the top two systems of DCASE2023 challenge in accuracy and complexity, and obtains state-of-the-art result. Code is at: https://github.com/Jessytan/Low-complexity-ASC.

6/13/2024

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging to achieve good performance without knowing the source of the audio clips during evaluation. To address this, we propose a sound event detection method using domain generalization. Our approach integrates features from bidirectional encoder representations from audio transformers and a convolutional recurrent neural network. We focus on three main strategies to improve our method. First, we apply mixstyle to the frequency dimension to adapt the mel-spectrograms from different domains. Second, we consider training loss of our model specific to each datasets for their corresponding classes. This independent learning framework helps the model extract domain-specific features effectively. Lastly, we use the sound event bounding boxes method for post-processing. Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset.

7/2/2024