Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks

Read original: arXiv:2207.09792 - Published 8/16/2024 by Jianfeng Huang, Chenyang Li, Yimin Lin, Shiguo Lian

🤷

Overview

Collecting enough flaw images for training deep learning networks in industrial production is challenging.
Existing industrial anomaly detection methods use CNN-based unsupervised detection and localization networks.
These methods often fail when faced with new signal varieties, as traditional end-to-end networks struggle to fit nonlinear models in high-dimensional spaces.
They also rely on a memory library by clustering the features of normal images, which makes them not robust to texture changes.

Plain English Explanation

The paper presents a Vision Transformer-based (VIT-based) unsupervised anomaly detection network to address the limitations of existing industrial anomaly detection methods. The key idea is to use a hierarchical task learning approach and incorporate human experience to enhance the model's interpretability.

The network consists of two main components: a pattern generation network and a comparison network. The pattern generation network uses two VIT-based encoder modules to extract features from consecutive image patches, and then a VIT-based decoder module to learn the human-designed style of these features and predict the third image patch. The comparison network, based on a Siamese architecture, computes the similarity between the generated image patch and the original image patch.

Finally, the method uses a bi-directional inference strategy to refine the anomaly localization. Experiments on the public MVTec dataset and the authors' own leather and cloth datasets show the proposed method achieves state-of-the-art performance in anomaly detection.

Technical Explanation

The paper proposes a Vision Transformer-based (VIT-based) unsupervised anomaly detection network to address the challenges of industrial anomaly detection. The network consists of two main components:

Pattern Generation Network: This network uses two VIT-based encoder modules to extract features from two consecutive image patches. It then employs a VIT-based decoder module to learn the human-designed style of these features and predict the third image patch.
Comparison Network: This Siamese-based network computes the similarity between the generated image patch and the original image patch.

The method also uses a bi-directional inference strategy to refine the anomaly localization.

Experiments on the public MVTec dataset and the authors' own leather and cloth datasets show that the proposed method achieves state-of-the-art performance in anomaly detection, with an AUC of 99.8%.

Critical Analysis

The paper presents a novel approach to industrial anomaly detection that addresses the limitations of existing methods. The use of a hierarchical task learning approach and the incorporation of human experience to enhance the model's interpretability are promising ideas.

However, the paper does not provide a detailed analysis of the potential limitations or caveats of the proposed method. For example, it would be helpful to understand how the method performs on datasets with more complex or varied anomalies, or how it compares to other unsupervised anomaly detection approaches beyond the MVTec dataset.

Additionally, the paper could have explored the potential trade-offs between the method's accuracy and its computational complexity or inference time, which are important considerations for industrial applications.

Conclusion

The Vision Transformer-based unsupervised anomaly detection network proposed in this paper represents a significant advancement in the field of industrial anomaly detection. By leveraging hierarchical task learning and human experience, the method achieves state-of-the-art performance on public and proprietary datasets.

The ability to accurately detect and localize anomalies without relying on large labeled datasets is a valuable capability for industrial applications, where flaw images can be scarce. The authors' work demonstrates the potential of transformer-based architectures and human-in-the-loop approaches to address the challenges of industrial anomaly detection.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤷

Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks

Jianfeng Huang, Chenyang Li, Yimin Lin, Shiguo Lian

It is hard to collect enough flaw images for training deep learning network in industrial production. Therefore, existing industrial anomaly detection methods prefer to use CNN-based unsupervised detection and localization network to achieve this task. However, these methods always fail when there are varieties happened in new signals since traditional end-to-end networks suffer barriers of fitting nonlinear model in high-dimensional space. Moreover, they have a memory library by clustering the feature of normal images essentially, which cause it is not robust to texture change. To this end, we propose the Vision Transformer based (VIT-based) unsupervised anomaly detection network. It utilizes a hierarchical task learning and human experience to enhance its interpretability. Our network consists of pattern generation and comparison networks. Pattern generation network uses two VIT-based encoder modules to extract the feature of two consecutive image patches, then uses VIT-based decoder module to learn the human designed style of these features and predict the third image patch. After this, we use the Siamese-based network to compute the similarity of the generation image patch and original image patch. Finally, we refine the anomaly localization by the bi-directional inference strategy. Comparison experiments on public dataset MVTec dataset show our method achieves 99.8% AUC, which surpasses previous state-of-the-art methods. In addition, we give a qualitative illustration on our own leather and cloth datasets. The accurate segment results strongly prove the accuracy of our method in anomaly detection.

8/16/2024

Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

Cristiano Patr'icio, Carlo Alberto Barbano, Attilio Fiandrotti, Riccardo Renzulli, Marco Grangetto, Luis F. Teixeira, Jo~ao C. Neves

Contrastive Analysis (CA) regards the problem of identifying patterns in images that allow distinguishing between a background (BG) dataset (i.e. healthy subjects) and a target (TG) dataset (i.e. unhealthy subjects). Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner. However, the dependency on target (unhealthy) samples can be challenging in medical scenarios due to their limited availability. Also, the blurred reconstructions of VAEs lack utility and interpretability. In this work, we redefine the CA task by employing a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques. Subsequently, we exploit state-of-the-art generative methods, i.e. diffusion models, conditioned on the learned latent representation to produce a realistic (healthy) version of the input image encoding solely the common patterns. Thorough validation on a facial image dataset and experiments across three brain MRI datasets demonstrate that conditioning the generative process of state-of-the-art generative methods with the latent representation from our self-supervised contrastive encoder yields improvements in the generated image quality and in the accuracy of image classification. The code is available at https://github.com/CristianoPatricio/unsupervised-contrastive-cond-diff.

6/5/2024

AnomalyFactory: Regard Anomaly Generation as Unsupervised Anomaly Localization

Ying Zhao

Recent advances in anomaly generation approaches alleviate the effect of data insufficiency on task of anomaly localization. While effective, most of them learn multiple large generative models on different datasets and cumbersome anomaly prediction models for different classes. To address the limitations, we propose a novel scalable framework, named AnomalyFactory, that unifies unsupervised anomaly generation and localization with same network architecture. It starts with a BootGenerator that combines structure of a target edge map and appearance of a reference color image with the guidance of a learned heatmap. Then, it proceeds with a FlareGenerator that receives supervision signals from the BootGenerator and reforms the heatmap to indicate anomaly locations in the generated image. Finally, it easily transforms the same network architecture to a BlazeDetector that localizes anomaly pixels with the learned heatmap by converting the anomaly images generated by the FlareGenerator to normal images. By manipulating the target edge maps and combining them with various reference images, AnomalyFactory generates authentic and diversity samples cross domains. Comprehensive experiments carried on 5 datasets, including MVTecAD, VisA, MVTecLOCO, MADSim and RealIAD, demonstrate that our approach is superior to competitors in generation capability and scalability.

8/20/2024

🤿

An Attention-Based Deep Generative Model for Anomaly Detection in Industrial Control Systems

Mayra Macas, Chunming Wu, Walter Fuertes

Anomaly detection is critical for the secure and reliable operation of industrial control systems. As our reliance on such complex cyber-physical systems grows, it becomes paramount to have automated methods for detecting anomalies, preventing attacks, and responding intelligently. {This paper presents a novel deep generative model to meet this need. The proposed model follows a variational autoencoder architecture with a convolutional encoder and decoder to extract features from both spatial and temporal dimensions. Additionally, we incorporate an attention mechanism that directs focus towards specific regions, enhancing the representation of relevant features and improving anomaly detection accuracy. We also employ a dynamic threshold approach leveraging the reconstruction probability and make our source code publicly available to promote reproducibility and facilitate further research. Comprehensive experimental analysis is conducted on data from all six stages of the Secure Water Treatment (SWaT) testbed, and the experimental results demonstrate the superior performance of our approach compared to several state-of-the-art baseline techniques.

5/10/2024