Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Read original: arXiv:2409.07016 - Published 9/12/2024 by Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Overview

The paper introduces a novel approach called "Low-Rank Adaptation Fine-Tuning" to improve the performance of pre-trained audio models for anomalous sound detection tasks.
The method involves fine-tuning the model parameters by learning a low-rank adaptation matrix, which can efficiently capture the distribution shift between the source and target domains.
The authors demonstrate the effectiveness of their approach on several benchmark datasets, showing significant improvements over traditional fine-tuning techniques.

Plain English Explanation

The paper focuses on a common problem in machine learning: how to take a pre-trained model that has been trained on one dataset and adapt it to work well on a new dataset. This is particularly challenging for anomalous sound detection, where the model needs to identify unusual or abnormal sounds.

The researchers' key insight is that rather than retraining the entire model from scratch, they can learn a "low-rank adaptation matrix" that efficiently captures the differences between the original dataset and the new one. This allows the model to be quickly adapted to the new task without having to start over completely.

The authors show that this "low-rank adaptation" approach outperforms traditional fine-tuning techniques, meaning the model can be more accurately adapted to the new anomalous sound detection task. This could be highly valuable in real-world applications, where pre-trained models often need to be adapted to specific environments or use cases.

Technical Explanation

The paper proposes a novel fine-tuning approach called "Low-Rank Adaptation Fine-Tuning" (LRAFT) to adapt pre-trained audio models for anomalous sound detection tasks. The key idea is to learn a low-rank adaptation matrix that can efficiently capture the distribution shift between the source and target domains, rather than retraining the entire model.

Specifically, the LRAFT approach fine-tunes the pre-trained model by learning a low-rank adaptation matrix that transforms the original model parameters. This allows the model to adapt to the new task while preserving most of the learned representations from the source domain. The authors demonstrate that this approach outperforms standard fine-tuning techniques on several benchmark datasets for anomalous sound detection.

The paper also includes an extensive experimental evaluation, comparing LRAFT to various baselines and ablation studies to understand the importance of the low-rank adaptation matrix. The results show that LRAFT can achieve significant performance improvements over traditional fine-tuning, particularly in low-data regimes where the target dataset is small.

Critical Analysis

The paper presents a well-designed and carefully executed study, with a thorough experimental evaluation to validate the effectiveness of the proposed LRAFT approach. However, the authors do acknowledge some limitations and potential areas for future research.

One key limitation is that the LRAFT approach still relies on access to labeled data from the target domain, which may not always be available in real-world scenarios. The authors suggest exploring unsupervised or semi-supervised adaptation techniques as a possible direction for future work.

Additionally, the paper focuses on a single task (anomalous sound detection) and it would be interesting to see how the LRAFT approach generalizes to other audio-related tasks, such as audio classification or audio generation. Extending the method to handle multiple target tasks simultaneously could also be a valuable area of investigation.

Overall, the paper makes a strong contribution to the field of audio model adaptation, and the LRAFT approach offers a promising solution for improving the performance of pre-trained models on new datasets and tasks.

Conclusion

The paper presents a novel fine-tuning technique called "Low-Rank Adaptation Fine-Tuning" (LRAFT) that can effectively adapt pre-trained audio models for anomalous sound detection tasks. By learning a low-rank adaptation matrix, the method can efficiently capture the distribution shift between the source and target domains, leading to significant performance improvements over traditional fine-tuning approaches.

The authors' experimental results demonstrate the effectiveness of LRAFT on several benchmark datasets, showcasing its potential for real-world applications where pre-trained models need to be adapted to specific environments or use cases. While the method still relies on access to labeled target data, the paper lays the groundwork for future research exploring unsupervised or semi-supervised adaptation techniques, as well as the extension of LRAFT to a broader range of audio-related tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang

Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings. Though possessing great potential, ASD systems can hardly be readily deployed in real production sites due to the generalization problem, which is primarily caused by the difficulty of data collection and the complexity of environmental factors. This paper introduces a robust ASD model that leverages audio pre-trained models. Specifically, we fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy. Additionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA) tuning instead of full fine-tuning to address the problem of limited data for fine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models, including top-tier traditional convolutional networks and speech pre-trained models, which demonstrates the effectiveness of audio pre-trained models with LoRA tuning. Ablation studies are also conducted to showcase the efficacy of the proposed scheme.

9/12/2024

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, resulting in inconsistency in data and architecture. Thus, we propose AnoPatch which utilizes a ViT backbone pre-trained on AudioSet and fine-tunes it on machine audio. It is believed that machine audio is more related to audio datasets than speech datasets, and modeling it from patch level suits the sparsity of machine audio. As a result, AnoPatch showcases state-of-the-art (SOTA) performances on the DCASE 2020 ASD dataset and the DCASE 2023 ASD dataset. We also compare multiple pre-trained models and empirically demonstrate that better consistency yields considerable improvement.

6/18/2024

Stream-based Active Learning for Anomalous Sound Detection in Machine Condition Monitoring

Tuan Vu Ho, Kota Dohi, Yohei Kawaguchi

This paper introduces an active learning (AL) framework for anomalous sound detection (ASD) in machine condition monitoring system. Typically, ASD models are trained solely on normal samples due to the scarcity of anomalous data, leading to decreased accuracy for unseen samples during inference. AL is a promising solution to solve this problem by enabling the model to learn new concepts more effectively with fewer labeled examples, thus reducing manual annotation efforts. However, its effectiveness in ASD remains unexplored. To minimize update costs and time, our proposed method focuses on updating the scoring backend of ASD system without retraining the neural network model. Experimental results on the DCASE 2023 Challenge Task 2 dataset confirm that our AL framework significantly improves ASD performance even with low labeling budgets. Moreover, our proposed sampling strategy outperforms other baselines in terms of the partial area under the receiver operating characteristic score.

8/13/2024

🛠️

Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study

Yi Yuan, Haohe Liu, Jinhua Liang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

Deep neural networks have recently achieved breakthroughs in sound generation. Despite the outstanding sample quality, current sound generation models face issues on small-scale datasets (e.g., overfitting), significantly limiting performance. In this paper, we make the first attempt to investigate the benefits of pre-training on sound generation with AudioLDM, the cutting-edge model for audio generation, as the backbone. Our study demonstrates the advantages of the pre-trained AudioLDM, especially in data-scarcity scenarios. In addition, the baselines and evaluation protocol for sound generation systems are not consistent enough to compare different studies directly. Aiming to facilitate further study on sound generation tasks, we benchmark the sound generation task on various frequently-used datasets. We hope our results on transfer learning and benchmarks can provide references for further research on conditional sound generation.

7/30/2024