Efficient Anomaly Detection with Budget Annotation Using Semi-Supervised Residual Transformer

Read original: arXiv:2306.03492 - Published 7/12/2024 by Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Mingwen Wang, Peng Wang

❗

Overview

Anomaly detection is challenging because training data usually only includes normal samples, and the detector must discover anomalies on-the-fly.
While deep learning approaches have helped, there is still progress to be made in developing industrial-class anomaly detectors for real-world applications.
Some tasks allow for a few labeled anomalous samples to achieve higher accuracy, but this comes at the cost of significant annotation effort, which is often impractical.
This paper presents a unified framework to address these two problems.

Plain English Explanation

The paper tackles two key challenges in anomaly detection. The first is that anomaly detectors are typically trained only on normal samples, so they have to figure out what's abnormal as they go. Deep learning approaches have helped, but there's still a ways to go before these systems are ready for real-world industrial use.

The second challenge is that in some cases, you can label a few anomalous samples to improve accuracy. However, this manual labeling process can be extremely time-consuming and costly, making it impractical in many situations. Some methods have tried to address this by using more efficient labeling, like just bounding boxes.

The authors propose a unified framework to tackle both of these problems. First, they train a sliding vision transformer on residuals generated by a novel patch-matching technique. This helps the system learn what normal looks like, and spot anomalies.

Second, they reframe the traditional pixel-wise segmentation problem as a block-wise classification task. This allows their sliding transformer to achieve even higher accuracy with less manual labeling. To further reduce labeling costs, they propose using just bounding boxes to mark anomalous regions, and then using semi-supervised learning to learn from the unlabeled areas.

The end result is a system that outperforms state-of-the-art approaches on standard benchmarks, whether in fully unsupervised or partially supervised settings. It's an impressive step forward in making anomaly detection more practical and accessible for real-world applications.

Technical Explanation

The core innovation of this work is a unified framework that combines a patch-matching-based anomaly detection approach with a semi-supervised learning scheme to address the dual challenges of anomaly detection.

First, the authors train a sliding vision transformer on the residuals generated by a novel position-constrained patch-matching technique. This allows the system to learn the characteristics of normal samples and effectively detect anomalies, building on the success of previous patch-matching-based anomaly detectors.

Second, the authors reframe the traditional pixel-wise anomaly segmentation problem as a block-wise classification task. This enables their sliding transformer to achieve even higher accuracy with much less manual labeling effort, as it only needs to classify entire blocks rather than individual pixels.

To further reduce labeling costs, the authors propose using just bounding boxes to mark anomalous regions, rather than full pixel-level annotations. They then employ a customized semi-supervised learning scheme, leveraging novel data augmentation techniques, to effectively learn from the unlabeled areas within these bounding boxes.

The proposed "SemiREST" method outperforms all state-of-the-art approaches across various evaluation metrics, both in unsupervised and supervised settings. On the popular MVTec-AD dataset, SemiREST achieves an impressive 81.2% Average Precision (AP) in the unsupervised condition and 84.4% AP in the supervised setting. Remarkably, even with just bounding-box-based semi-supervision, SemiREST still surpasses fully supervised SOTA methods, achieving 83.8% AP on MVTec-AD.

Critical Analysis

The paper presents a well-designed and comprehensive approach to addressing the key challenges in anomaly detection. The authors' insights in reframing the problem and leveraging a combination of patch-matching, transformer-based learning, and semi-supervised techniques are quite innovative.

However, the paper does mention a few caveats and limitations. For example, the authors note that their method may not generalize as well to anomalies that are vastly different from the training data, as the patch-matching component relies heavily on similarities to normal samples. Additionally, the semi-supervised learning scheme, while effective, still requires some manual labeling effort, which may not be feasible in all real-world scenarios.

Further research could explore ways to make the anomaly detection even more robust to highly dissimilar anomalies, perhaps by incorporating more diverse data augmentation techniques or by exploring alternative architectural designs. Investigating methods to reduce the labeling burden even further, such as active learning strategies, could also be a fruitful direction.

Overall, this paper represents a significant advancement in the field of anomaly detection, particularly in its ability to achieve high performance with limited supervision. The authors' innovative approaches and the strong empirical results make this work a valuable contribution to the ongoing efforts to develop practical, industrial-grade anomaly detection systems.

Conclusion

This paper tackles two key challenges in anomaly detection: the need to detect anomalies without seeing them during training, and the high cost of manually labeling anomalous samples. The authors propose a unified framework that combines patch-matching, transformer-based learning, and semi-supervised techniques to address these problems.

The resulting "SemiREST" method outperforms state-of-the-art approaches on standard benchmarks, even when using just bounding box annotations for anomalous regions. This represents a significant step forward in making anomaly detection more practical and accessible for real-world industrial applications.

While the paper identifies some limitations, the authors' innovative ideas and the strong empirical results demonstrate the potential of this research to drive further advancements in the field of anomaly detection. As these techniques continue to evolve, they could have far-reaching implications for a wide range of industries and applications where the reliable detection of abnormalities is crucial.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

❗

Efficient Anomaly Detection with Budget Annotation Using Semi-Supervised Residual Transformer

Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Mingwen Wang, Peng Wang

Recent advancements in industrial Anomaly Detection (AD) have shown that incorporating a few anomalous samples during training can significantly boost accuracy. However, this performance improvement comes at a high cost: extensive annotation efforts, which are often impractical in real-world applications. In this work, we propose a novel framework called Weakly-supervised RESidual Transformer (WeakREST), which aims to achieve high AD accuracy while minimizing the need for extensive annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. By shifting the focus to block-wise level, we can drastically reduce the amount of required annotations without compromising on the accuracy of anomaly detection Secondly, we design a residual-based transformer model, termed Positional Fast Anomaly Residuals (PosFAR), to classify the image blocks in real time. We further propose to label the anomalous regions using only bounding boxes or image tags as weaker labels, leading to a semi-supervised learning setting. On the benchmark dataset MVTec-AD, our proposed WeakREST framework achieves a remarkable Average Precision (AP) of 83.0%, significantly outperforming the previous best result of 75.8% in the unsupervised setting. In the supervised AD setting, WeakREST further improves performance, attaining an AP of 87.6% compared to the previous best of 78.6%. Notably, even when utilizing weaker labels based on bounding boxes, WeakREST surpasses recent leading methods that rely on pixel-wise supervision, achieving an AP of 87.1% against the prior best of 78.6% on MVTec-AD. This precision advantage is also consistently observed on other well-known AD datasets, such as BTAD and KSDD2.

7/12/2024

Domain-independent detection of known anomalies

Jonas Buhler, Jonas Fehrenbach, Lucas Steinmann, Christian Nauck, Marios Koulakis

One persistent obstacle in industrial quality inspection is the detection of anomalies. In real-world use cases, two problems must be addressed: anomalous data is sparse and the same types of anomalies need to be detected on previously unseen objects. Current anomaly detection approaches can be trained with sparse nominal data, whereas domain generalization approaches enable detecting objects in previously unseen domains. Utilizing those two observations, we introduce the hybrid task of domain generalization on sparse classes. To introduce an accompanying dataset for this task, we present a modification of the well-established MVTec AD dataset by generating three new datasets. In addition to applying existing methods for benchmark, we design two embedding-based approaches, Spatial Embedding MLP (SEMLP) and Labeled PatchCore. Overall, SEMLP achieves the best performance with an average image-level AUROC of 87.2 % vs. 80.4 % by MIRO. The new and openly available datasets allow for further research to improve industrial anomaly detection.

7/4/2024

❗

Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah

The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

5/30/2024

Multi-feature Reconstruction Network using Crossed-mask Restoration for Unsupervised Anomaly Detection

Junpu Wang, Guili Xu, Chunlei Li, Guangshuai Gao, Yuehua Cheng

Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above issues, we convert the image reconstruction into a combination of parallel feature restorations and propose a multi-feature reconstruction network, MFRNet, using crossed-mask restoration in this paper. Specifically, a multi-scale feature aggregator is first developed to generate more discriminative hierarchical representations of the input images from a pre-trained model. Subsequently, a crossed-mask generator is adopted to randomly cover the extracted feature map, followed by a restoration network based on the transformer structure for high-quality repair of the missing regions. Finally, a hybrid loss is equipped to guide model training and anomaly estimation, which gives consideration to both the pixel and structural similarity. Extensive experiments show that our method is highly competitive with or significantly outperforms other state-of-the-arts on four public available datasets and one self-made dataset.

4/23/2024