AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples

Read original: arXiv:2408.13516 - Published 8/27/2024 by Yujin Lee, Seoyoon Jang, Hyunsoo Yoon

AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples

Overview

This paper proposes a novel few-shot anomaly detection method called AnoPLe, which leverages bi-directional prompt learning with only normal samples.
AnoPLe trains a prompt encoder to transform input data into prompts that can be used to classify normal and anomalous samples.
The method achieves strong performance on few-shot anomaly detection benchmarks, outperforming existing approaches that require access to anomalous samples during training.

Plain English Explanation

The paper introduces a new technique called AnoPLe for detecting anomalies in data using only normal samples during training. Normally, anomaly detection models need examples of both normal and anomalous data to learn what is considered abnormal. However, this can be challenging since anomalous data is often rare or difficult to obtain.

AnoPLe solves this by training a prompt encoder - a neural network component that transforms normal input data into short "prompts" that capture the key features of normal samples. These prompts can then be used to classify whether a new input is normal or anomalous, without ever needing to see examples of the anomalous data.

The key innovation is that the prompt encoder is bi-directional, meaning it can generate prompts from normal inputs and also reconstruct the original normal inputs from the prompts. This bidirectional capability helps the model learn more robust and informative prompts that better represent the normal data distribution.

By leveraging this bi-directional prompt learning approach and only requiring normal samples, AnoPLe is able to achieve state-of-the-art performance on few-shot anomaly detection benchmarks, outperforming other methods that rely on having access to both normal and anomalous data during training.

Technical Explanation

The paper introduces the AnoPLe (Anomaly detection via Bi-directional Prompt Learning) framework for few-shot anomaly detection. The core component is a prompt encoder network that transforms normal input data into short "prompts" that capture the key features of the normal samples.

The prompt encoder is designed to be bi-directional, meaning it can not only generate prompts from normal inputs, but also reconstruct the original normal inputs from the prompts. This bidirectional capability helps the model learn more robust and informative prompts that better represent the normal data distribution.

During training, the prompt encoder is optimized to both generate prompts that can accurately classify normal vs. anomalous samples, as well as reconstruct the original normal inputs from the generated prompts. This bi-directional prompt learning approach allows the model to learn effective prompts using only normal training data, without requiring any access to anomalous samples.

At inference time, the trained prompt encoder is used to transform a new input into a prompt, which is then classified as normal or anomalous based on its similarity to the prompts of known normal samples. The authors demonstrate that this AnoPLe approach achieves state-of-the-art performance on few-shot anomaly detection benchmarks, outperforming other methods that require access to both normal and anomalous data during training.

Critical Analysis

The key strength of the AnoPLe method is its ability to perform effective anomaly detection using only normal data for training, which is a significant advantage over traditional approaches that require access to both normal and anomalous samples.

However, the paper does not extensively explore the limitations or potential downsides of the approach. For example, it's unclear how the method would scale to more complex or high-dimensional data, or how sensitive it is to the quality and diversity of the normal training samples.

Additionally, the paper could have provided more insight into the inner workings of the bi-directional prompt encoder and how this architecture contributes to the model's strong performance. A deeper analysis of the learned prompts and their properties could also shed light on the strengths and weaknesses of the approach.

Overall, the AnoPLe method represents an innovative step forward in few-shot anomaly detection, but further research is needed to fully understand its capabilities, limitations, and potential areas for improvement.

Conclusion

The AnoPLe paper introduces a novel few-shot anomaly detection framework that leverages bi-directional prompt learning to achieve state-of-the-art performance using only normal training samples. This is a significant advancement over traditional anomaly detection methods that require access to both normal and anomalous data.

By training a prompt encoder to transform normal inputs into informative prompts that can be used for classification, AnoPLe demonstrates the potential of prompt-based approaches to tackle challenging few-shot learning problems. The bi-directional nature of the prompt encoder is a key innovation that helps the model learn more robust and representative prompts.

While the paper does not fully explore the limitations of the approach, the strong empirical results on benchmark datasets suggest that AnoPLe is a promising direction for further research in few-shot anomaly detection. As the field continues to evolve, techniques like AnoPLe that can leverage limited training data may become increasingly valuable for real-world applications where anomalous samples are scarce or difficult to obtain.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples

Yujin Lee, Seoyoon Jang, Hyunsoo Yoon

Few-shot Anomaly Detection (FAD) poses significant challenges due to the limited availability of training samples and the frequent absence of abnormal samples. Previous approaches often rely on annotations or true abnormal samples to improve detection, but such textual or visual cues are not always accessible. To address this, we introduce AnoPLe, a multi-modal prompt learning method designed for anomaly detection without prior knowledge of anomalies. AnoPLe simulates anomalies and employs bidirectional coupling of textual and visual prompts to facilitate deep interaction between the two modalities. Additionally, we integrate a lightweight decoder with a learnable multi-view signal, trained on multi-scale images to enhance local semantic comprehension. To further improve performance, we align global and local semantics, enriching the image-level understanding of anomalies. The experimental results demonstrate that AnoPLe achieves strong FAD performance, recording 94.1% and 86.2% Image AUROC on MVTec-AD and VisA respectively, with only around a 1% gap compared to the SoTA, despite not being exposed to true anomalies. Code is available at https://github.com/YoojLee/AnoPLe.

8/27/2024

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma

The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.

7/17/2024

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

Chenchen Tao, Xiaohao Peng, Chong Wang, Jiafei Wu, Puning Zhao, Jun Wang, Jiangbo Qian

Most models for weakly supervised video anomaly detection (WS-VAD) rely on multiple instance learning, aiming to distinguish normal and abnormal snippets without specifying the type of anomaly. However, the ambiguous nature of anomaly definitions across contexts may introduce inaccuracy in discriminating abnormal and normal events. To show the model what is anomalous, a novel framework is proposed to guide the learning of suspected anomalies from event prompts. Given a textual prompt dictionary of potential anomaly events and the captions generated from anomaly videos, the semantic anomaly similarity between them could be calculated to identify the suspected events for each video snippet. It enables a new multi-prompt learning process to constrain the visual-semantic features across all videos, as well as provides a new way to label pseudo anomalies for self-training. To demonstrate its effectiveness, comprehensive experiments and detailed ablation studies are conducted on four datasets, namely XD-Violence, UCF-Crime, TAD, and ShanghaiTech. Our proposed model outperforms most state-of-the-art methods in terms of AP or AUC (86.5%, hl{90.4}%, 94.4%, and 97.4%). Furthermore, it shows promising performance in open-set and cross-dataset cases. The data, code, and models can be found at: url{https://github.com/shiwoaz/lap}.

9/4/2024

FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model

Yuanwei Li, Elizaveta Ivanova, Martins Bruveris

Automatic image anomaly detection is important for quality inspection in the manufacturing industry. The usual unsupervised anomaly detection approach is to train a model for each object class using a dataset of normal samples. However, a more realistic problem is zero-/few-shot anomaly detection where zero or only a few normal samples are available. This makes the training of object-specific models challenging. Recently, large foundation vision-language models have shown strong zero-shot performance in various downstream tasks. While these models have learned complex relationships between vision and language, they are not specifically designed for the tasks of anomaly detection. In this paper, we propose the Few-shot/zero-shot Anomaly Detection Engine (FADE) which leverages the vision-language CLIP model and adjusts it for the purpose of industrial anomaly detection. Specifically, we improve language-guided anomaly segmentation 1) by adapting CLIP to extract multi-scale image patch embeddings that are better aligned with language and 2) by automatically generating an ensemble of text prompts related to industrial anomaly detection. 3) We use additional vision-based guidance from the query and reference images to further improve both zero-shot and few-shot anomaly detection. On the MVTec-AD (and VisA) dataset, FADE outperforms other state-of-the-art methods in anomaly segmentation with pixel-AUROC of 89.6% (91.5%) in zero-shot and 95.4% (97.5%) in 1-normal-shot. Code is available at https://github.com/BMVC-FADE/BMVC-FADE.

9/4/2024