PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

2404.05231

YC

0

Reddit

0

Published 4/9/2024 by Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Abstract

The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces PromptAD, a novel few-shot anomaly detection method that learns prompts using only normal (non-anomalous) samples.
  • Anomaly detection is the task of identifying data points that deviate significantly from the majority of the data, which is crucial for applications like fraud detection, system monitoring, and public health surveillance.
  • PromptAD aims to address the challenge of few-shot anomaly detection, where only a limited number of normal samples are available during training.

Plain English Explanation

PromptAD is a new technique for detecting anomalies (unusual or unexpected data points) in a dataset, even when you only have a small number of examples of "normal" data to work with during training. Anomaly detection is important for things like catching financial fraud, monitoring for problems in computer systems, and identifying public health issues.

The key insight behind PromptAD is that it can learn special "prompts" (instructions) that help a machine learning model identify anomalies, without needing many examples of what the anomalies actually look like. Instead, PromptAD just uses the normal, non-anomalous data samples to learn these prompts. This is useful because in many real-world situations, we may only have a few examples of normal data, but don't have any examples of the anomalies we want to detect.

By learning prompts from the normal data alone, PromptAD can then apply those prompts to classify new, unseen data as either normal or anomalous. This makes it a powerful tool for few-shot anomaly detection, where we don't have a lot of training data to work with.

Technical Explanation

PromptAD is a novel few-shot anomaly detection method that learns prompts using only normal (non-anomalous) samples. This builds on recent work on prompt-based learning for large language models.

The key idea is to learn a set of prompts that, when applied to a pre-trained classifier, can effectively separate normal and anomalous data, without requiring any examples of the anomalous data during training. PromptAD first encodes the normal samples into a latent space, then learns prompts that maximize the distance between the normal and anomalous samples in that latent space.

The authors evaluate PromptAD on several few-shot anomaly detection benchmarks, including computer vision and text classification tasks. They show that PromptAD outperforms previous few-shot anomaly detection methods that require anomalous samples during training, demonstrating the effectiveness of their prompt-based approach.

Critical Analysis

The PromptAD paper makes a compelling case for the utility of prompt-based learning in the context of few-shot anomaly detection. By avoiding the need for anomalous training samples, PromptAD sidesteps a key challenge in many real-world anomaly detection scenarios.

However, the paper does not extensively explore the limitations of the proposed approach. For example, it's unclear how well PromptAD would perform in cases where the distribution of normal data is highly complex or multimodal. The impact of prompts on zero-shot detection performance is also an open question.

Additionally, the authors acknowledge that PromptAD relies on having a pre-trained classifier model, which may not always be available. Exploring methods to learn both the prompts and the base classifier jointly could be a valuable direction for future research.

Conclusion

The PromptAD paper introduces a novel few-shot anomaly detection method that can effectively identify anomalies using only normal training samples. By learning prompts that maximize the separation between normal and anomalous data in the latent space, PromptAD sidesteps the need for anomalous training data, which is a common bottleneck in many real-world anomaly detection scenarios.

The authors demonstrate the strong performance of PromptAD on several benchmarks, suggesting that prompt-based learning is a promising approach for few-shot anomaly detection. While the paper does not extensively explore the limitations of the method, it lays the groundwork for future research in this direction, which could have significant implications for a wide range of applications that rely on robust anomaly detection.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme

Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme

Pi-Wei Chen, Jerry Chun-Wei Lin, Jia Ji, Feng-Hao Yeh, Chao-Chun Chen

YC

0

Reddit

0

Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through data-driven methods, eliminating the need for human intervention. The primary challenge in this approach is the lack of anomalous samples during the training phase. Additionally, the Vision Transformer (ViT)-based image encoder in VLMs is not ideal for pixel-wise anomaly segmentation due to a locality feature mismatch between the original image and the output feature map. To tackle the first challenge, we have developed the Object-Attention Anomaly Generation Module (OAGM) to synthesize anomaly samples for training. Furthermore, our Meta-Guiding Prompt-Tuning Scheme (MPTS) iteratively adjusts the gradient-based optimization direction of learnable prompts to avoid overfitting to the synthesized anomalies. For the second challenge, we propose Locality-Aware Attention, which ensures that each local patch feature attends only to nearby patch features, preserving the locality features corresponding to their original locations. This framework allows for the optimal prompt embeddings by searching in the continuous latent space via backpropagation, free from human semantic constraints. Additionally, the modified locality-aware attention improves the precision of pixel-wise anomaly segmentation.

Read more

6/27/2024

šŸ·ļø

Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

Jintao Rong, Hao Chen, Tianxiao Chen, Linlin Ou, Xinyi Yu, Yifan Liu

YC

0

Reddit

0

Prompt learning has become a popular approach for adapting large vision-language models, such as CLIP, to downstream tasks. Typically, prompt learning relies on a fixed prompt token or an input-conditional token to fit a small amount of data under full supervision. While this paradigm can generalize to a certain range of unseen classes, it may struggle when domain gap increases, such as in fine-grained classification and satellite image segmentation. To address this limitation, we propose Retrieval-enhanced Prompt learning (RePrompt), which introduces retrieval mechanisms to cache the knowledge representations from downstream tasks. we first construct a retrieval database from training examples, or from external examples when available. We then integrate this retrieval-enhanced mechanism into various stages of a simple prompt learning baseline. By referencing similar samples in the training set, the enhanced model is better able to adapt to new tasks with few samples. Our extensive experiments over 15 vision datasets, including 11 downstream tasks with few-shot setting and 4 domain generalization benchmarks, demonstrate that RePrompt achieves considerably improved performance. Our proposed approach provides a promising solution to the challenges faced by prompt learning when domain gap increases. The code and models will be available.

Read more

6/19/2024

Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du

YC

0

Reddit

0

Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detection. However, the more challenging task of few-shot near OOD detection has not yet been addressed. In this study, we investigate the near OOD detection capabilities of prompt learning models and observe that commonly used OOD scores have limited performance in near OOD detection. To enhance the performance, we propose a fast and simple post-hoc method that complements existing logit-based scores, improving near OOD detection AUROC by up to 11.67% with minimal computational cost. Our method can be easily applied to any prompt learning model without change in architecture or re-training the models. Comprehensive empirical evaluations across 13 datasets and 8 models demonstrate the effectiveness and adaptability of our method.

Read more

5/28/2024

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

YC

0

Reddit

0

Weakly supervised video anomaly detection (WSVAD) is a challenging task. Generating fine-grained pseudo-labels based on weak-label and then self-training a classifier is currently a promising solution. However, since the existing methods use only RGB visual modality and the utilization of category text information is neglected, thus limiting the generation of more accurate pseudo-labels and affecting the performance of self-training. Inspired by the manual labeling process based on the event description, in this paper, we propose a novel pseudo-label generation and self-training framework based on Text Prompt with Normality Guidance (TPWNG) for WSVAD. Our idea is to transfer the rich language-visual knowledge of the contrastive language-image pre-training (CLIP) model for aligning the video event description text and corresponding video frames to generate pseudo-labels. Specifically, We first fine-tune the CLIP for domain adaptation by designing two ranking losses and a distributional inconsistency loss. Further, we propose a learnable text prompt mechanism with the assist of a normality visual prompt to further improve the matching accuracy of video event description text and video frames. Then, we design a pseudo-label generation module based on the normality guidance to infer reliable frame-level pseudo-labels. Finally, we introduce a temporal context self-adaptive learning module to learn the temporal dependencies of different video events more flexibly and accurately. Extensive experiments show that our method achieves state-of-the-art performance on two benchmark datasets, UCF-Crime and XD-Viole

Read more

4/15/2024