A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection

Read original: arXiv:2409.03610 - Published 9/6/2024 by Yucong Zhang, Juan Liu, Yao Tian, Haifeng Liu, Ming Li

A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection

Overview

Proposed a dual-path framework with frequency-and-time excited network for anomalous sound detection
Leverages both frequency and temporal information to improve detection performance
Demonstrated improved accuracy over existing methods on multiple datasets

Plain English Explanation

The paper introduces a new approach for detecting unusual or abnormal sounds, which is an important task in areas like industrial monitoring and surveillance. The key idea is to use a dual-path framework that processes the audio input in two parallel pathways - one focused on the frequency information and the other on the temporal patterns.

By capturing both the frequency spectrum and how the sound evolves over time, the model can better identify anomalies that may not be apparent from just the frequency or just the time domain alone. This frequency-and-time excited network aims to learn a more comprehensive representation of the audio signal.

The researchers show that this dual-path approach outperforms existing methods for anomalous sound detection on several benchmark datasets. This suggests the importance of considering both frequency and temporal information when trying to distinguish normal from abnormal sounds, which has applications in predictive maintenance, safety monitoring, and other real-world scenarios.

Technical Explanation

The authors propose a dual-path framework for anomalous sound detection that uses a frequency-and-time excited network. The input audio is processed through two parallel sub-networks - one focusing on the frequency-domain features and the other on the temporal evolution of the signal.

The frequency path takes the spectrogram of the audio and passes it through a series of convolutional and pooling layers to extract high-level frequency representations. The time path operates on the raw waveform, using 1D convolutions to model the temporal dynamics.

The outputs of the two paths are then concatenated and passed through additional layers to produce the final anomaly classification. The authors argue that this dual-path architecture allows the model to learn complementary features from both the frequency and time domains, resulting in improved detection performance compared to single-path approaches.

The proposed framework is evaluated on multiple anomalous sound detection datasets, where it demonstrates state-of-the-art results. The authors conduct ablation studies to analyze the contributions of the frequency and time paths, as well as the effect of different architectural choices.

Critical Analysis

The paper presents a well-designed and thorough experimental evaluation, with comparisons to several baseline methods on various datasets. The authors provide insightful analysis of the model's performance and the relative importance of the frequency and time paths.

One potential limitation is that the approach may be computationally more expensive than single-path models, as it requires processing the input through two separate sub-networks. The authors do not provide any information on the training and inference time of their framework compared to the baselines.

Additionally, the paper does not delve into the interpretability of the model's predictions or provide any visualization of the learned features. Incorporating interpretable components could help users better understand the model's decision-making process and potentially lead to further improvements.

Conclusion

The proposed dual-path framework with frequency-and-time excited network demonstrates strong performance for the task of anomalous sound detection. By leveraging both frequency and temporal information, the model is able to capture more comprehensive representations of the audio signals, leading to improved anomaly detection accuracy.

This work highlights the importance of considering multiple modalities of the input data when designing machine learning models for real-world applications. The dual-path architecture could potentially be applied to other tasks beyond anomaly detection that require understanding both spectral and temporal aspects of the input.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection

Yucong Zhang, Juan Liu, Yao Tian, Haifeng Liu, Ming Li

In contrast to human speech, machine-generated sounds of the same type often exhibit consistent frequency characteristics and discernible temporal periodicity. However, leveraging these dual attributes in anomaly detection remains relatively under-explored. In this paper, we propose an automated dual-path framework that learns prominent frequency and temporal patterns for diverse machine types. One pathway uses a novel Frequency-and-Time Excited Network (FTE-Net) to learn the salient features across frequency and time axes of the spectrogram. It incorporates a Frequency-and-Time Chunkwise Encoder (FTC-Encoder) and an excitation network. The other pathway uses a 1D convolutional network for utterance-level spectrum. Experimental results on the DCASE 2023 task 2 dataset show the state-of-the-art performance of our proposed method. Moreover, visualizations of the intermediate feature maps in the excitation network are provided to illustrate the effectiveness of our method.

9/6/2024

Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks

Kai Li, Khalid Zaman, Xingfeng Li, Masato Akagi, Masashi Unoki

Early detection of factory machinery malfunctions is crucial in industrial applications. In machine anomalous sound detection (ASD), different machines exhibit unique vibration-frequency ranges based on their physical properties. Meanwhile, the human auditory system is adept at tracking both temporal and spectral dynamics of machine sounds. Consequently, integrating the computational auditory models of the human auditory system with machine-specific properties can be an effective approach to machine ASD. We first quantified the frequency importances of four types of machines using the Fisher ratio (F-ratio). The quantified frequency importances were then used to design machine-specific non-uniform filterbanks (NUFBs), which extract the log non-uniform spectrum (LNS) feature. The designed NUFBs have a narrower bandwidth and higher filter distribution density in frequency regions with relatively high F-ratios. Finally, spectral and temporal modulation representations derived from the LNS feature were proposed. These proposed LNS feature and modulation representations are input into an autoencoder neural-network-based detector for ASD. The quantification results from the training set of the Malfunctioning Industrial Machine Investigation and Inspection dataset with a signal-to-noise (SNR) of 6 dB reveal that the distinguishing information between normal and anomalous sounds of different machines is encoded non-uniformly in the frequency domain. By highlighting these important frequency regions using NUFBs, the LNS feature can significantly enhance performance using the metric of AUC (area under the receiver operating characteristic curve) under various SNR conditions. Furthermore, modulation representations can further improve performance. Specifically, temporal modulation is effective for fans, pumps, and sliders, while spectral modulation is particularly effective for valves.

9/10/2024

Dual-path Frequency Discriminators for Few-shot Anomaly Detection

Yuhu Bai, Jiangning Zhang, Zhaofeng Chen, Yuhang Dong, Yunkang Cao, Guanzhong Tian

Few-shot anomaly detection (FSAD) plays a crucial role in industrial manufacturing. However, existing FSAD methods encounter difficulties leveraging a limited number of normal samples, frequently failing to detect and locate inconspicuous anomalies in the spatial domain. We have further discovered that these subtle anomalies would be more noticeable in the frequency domain. In this paper, we propose a Dual-Path Frequency Discriminators (DFD) network from a frequency perspective to tackle these issues. The original spatial images are transformed into multi-frequency images, making them more conducive to the tailored discriminators in detecting anomalies. Additionally, the discriminators learn a joint representation with forms of pseudo-anomalies. Extensive experiments conducted on MVTec AD and VisA benchmarks demonstrate that our DFD surpasses current state-of-the-art methods. The code is available at url{https://github.com/yuhbai/DFD}.

8/23/2024

Frequency Tracking Features for Data-Efficient Deep Siren Identification

Stefano Damiano, Thomas Dietzen, Toon van Waterschoot

The identification of siren sounds in urban soundscapes is a crucial safety aspect for smart vehicles and has been widely addressed by means of neural networks that ensure robustness to both the diversity of siren signals and the strong and unstructured background noise characterizing traffic. Convolutional neural networks analyzing spectrogram features of incoming signals achieve state-of-the-art performance when enough training data capturing the diversity of the target acoustic scenes is available. In practice, data is usually limited and algorithms should be robust to adapt to unseen acoustic conditions without requiring extensive datasets for re-training. In this work, given the harmonic nature of siren signals, characterized by a periodically evolving fundamental frequency, we propose a low-complexity feature extraction method based on frequency tracking using a single-parameter adaptive notch filter. The features are then used to design a small-scale convolutional network suitable for training with limited data. The evaluation results indicate that the proposed model consistently outperforms the traditional spectrogram-based model when limited training data is available, achieves better cross-domain generalization and has a smaller size.

9/16/2024