Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks

Read original: arXiv:2409.05319 - Published 9/10/2024 by Kai Li, Khalid Zaman, Xingfeng Li, Masato Akagi, Masashi Unoki

Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks

Overview

Presents a novel approach for machine anomalous sound detection using spectral-temporal modulation representations derived from machine-specific filterbanks.
Leverages an autoencoder model to learn the normal sound patterns of a target machine and detect anomalies.
Proposes a data-driven filterbank design to capture machine-specific acoustic characteristics.
Demonstrates superior performance compared to conventional methods on benchmark datasets.

Plain English Explanation

This research focuses on a key challenge in factory automation - detecting when a machine is malfunctioning or behaving abnormally based on the sounds it makes. The researchers developed a new approach that uses machine-specific filterbanks to capture the unique acoustic characteristics of a target machine.

They then feed these spectral-temporal modulation representations into an autoencoder model that is trained to recognize the normal sound patterns of the machine. When the model detects sounds that deviate significantly from normal, it flags them as potential anomalies, allowing for rapid detection and diagnosis of machine issues.

The key innovation is the custom filterbank design, which allows the system to focus on the most relevant frequency and temporal features for each specific machine type. This data-driven approach improves upon generic sound analysis techniques and enables more accurate anomaly detection.

Technical Explanation

The paper presents a novel framework for machine anomalous sound detection that leverages spectral-temporal modulation representations derived from machine-specific filterbanks. The core components are:

Data-Driven Filterbank Design: The authors propose a data-driven approach to design filterbanks that capture the unique acoustic characteristics of a target machine. This involves learning the optimal frequency and temporal resolutions from training data.
Spectral-Temporal Modulation Representations: The filterbank outputs are used to compute spectral-temporal modulation features, which encode both the frequency and temporal dynamics of the machine sounds.
Autoencoder-based Anomaly Detection: An autoencoder model is trained on the normal sound data to learn a compact representation. Sounds that deviate significantly from this learned representation are flagged as potential anomalies.

The authors evaluate their approach on several benchmark datasets and demonstrate superior performance compared to conventional methods for machine anomaly detection. The custom filterbanks and modulation features are shown to be crucial for capturing the nuanced acoustic signatures of different machine types.

Critical Analysis

The proposed framework represents a promising direction for advancing the state-of-the-art in machine anomaly detection. By tailoring the acoustic representation to the target machine, the system can more effectively learn the normal operating conditions and identify deviations.

However, the paper does not provide extensive details on the filterbank optimization procedure or the specific autoencoder architecture used. Additionally, the evaluation is limited to a few benchmark datasets, and further testing on real-world factory environments would be valuable to assess the practicality and robustness of the approach.

It would also be interesting to explore how this framework could be extended to handle more complex acoustic scenes with multiple machines operating simultaneously. Incorporating additional contextual information, such as sensor data or machine operation metadata, may further improve the anomaly detection capabilities.

Conclusion

This research presents a novel approach for machine anomalous sound detection that leverages machine-specific acoustic representations and an autoencoder-based anomaly detection model. The custom filterbank design and spectral-temporal modulation features enable the system to effectively capture the unique sound characteristics of different machine types, leading to improved anomaly detection performance.

The proposed framework has the potential to enhance factory automation and maintenance by providing an early warning system for machine issues, ultimately reducing downtime and improving productivity. Further research is needed to evaluate the approach in real-world settings and explore extensions to handle more complex acoustic environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks

Kai Li, Khalid Zaman, Xingfeng Li, Masato Akagi, Masashi Unoki

Early detection of factory machinery malfunctions is crucial in industrial applications. In machine anomalous sound detection (ASD), different machines exhibit unique vibration-frequency ranges based on their physical properties. Meanwhile, the human auditory system is adept at tracking both temporal and spectral dynamics of machine sounds. Consequently, integrating the computational auditory models of the human auditory system with machine-specific properties can be an effective approach to machine ASD. We first quantified the frequency importances of four types of machines using the Fisher ratio (F-ratio). The quantified frequency importances were then used to design machine-specific non-uniform filterbanks (NUFBs), which extract the log non-uniform spectrum (LNS) feature. The designed NUFBs have a narrower bandwidth and higher filter distribution density in frequency regions with relatively high F-ratios. Finally, spectral and temporal modulation representations derived from the LNS feature were proposed. These proposed LNS feature and modulation representations are input into an autoencoder neural-network-based detector for ASD. The quantification results from the training set of the Malfunctioning Industrial Machine Investigation and Inspection dataset with a signal-to-noise (SNR) of 6 dB reveal that the distinguishing information between normal and anomalous sounds of different machines is encoded non-uniformly in the frequency domain. By highlighting these important frequency regions using NUFBs, the LNS feature can significantly enhance performance using the metric of AUC (area under the receiver operating characteristic curve) under various SNR conditions. Furthermore, modulation representations can further improve performance. Specifically, temporal modulation is effective for fans, pumps, and sliders, while spectral modulation is particularly effective for valves.

9/10/2024

A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection

Yucong Zhang, Juan Liu, Yao Tian, Haifeng Liu, Ming Li

In contrast to human speech, machine-generated sounds of the same type often exhibit consistent frequency characteristics and discernible temporal periodicity. However, leveraging these dual attributes in anomaly detection remains relatively under-explored. In this paper, we propose an automated dual-path framework that learns prominent frequency and temporal patterns for diverse machine types. One pathway uses a novel Frequency-and-Time Excited Network (FTE-Net) to learn the salient features across frequency and time axes of the spectrogram. It incorporates a Frequency-and-Time Chunkwise Encoder (FTC-Encoder) and an excitation network. The other pathway uses a 1D convolutional network for utterance-level spectrum. Experimental results on the DCASE 2023 task 2 dataset show the state-of-the-art performance of our proposed method. Moreover, visualizations of the intermediate feature maps in the excitation network are provided to illustrate the effectiveness of our method.

9/6/2024

Stream-based Active Learning for Anomalous Sound Detection in Machine Condition Monitoring

Tuan Vu Ho, Kota Dohi, Yohei Kawaguchi

This paper introduces an active learning (AL) framework for anomalous sound detection (ASD) in machine condition monitoring system. Typically, ASD models are trained solely on normal samples due to the scarcity of anomalous data, leading to decreased accuracy for unseen samples during inference. AL is a promising solution to solve this problem by enabling the model to learn new concepts more effectively with fewer labeled examples, thus reducing manual annotation efforts. However, its effectiveness in ASD remains unexplored. To minimize update costs and time, our proposed method focuses on updating the scoring backend of ASD system without retraining the neural network model. Experimental results on the DCASE 2023 Challenge Task 2 dataset confirm that our AL framework significantly improves ASD performance even with low labeling budgets. Moreover, our proposed sampling strategy outperforms other baselines in terms of the partial area under the receiver operating characteristic score.

8/13/2024

Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations

Chao He, Hongmei Shi, Ruixin Li, Jianbo Li, ZuJun Yu

The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.

6/19/2024