LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Read original: arXiv:2407.08966 - Published 7/15/2024 by Yabin Zhang, Wenjie Zhu, Chenhang He, Lei Zhang

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Overview

This paper introduces a new method called LAPT (Label-driven Automated Prompt Tuning) for improving out-of-distribution (OOD) detection using vision-language models.
LAPT automatically tunes the prompts used to guide vision-language models in classifying whether an input image is in-distribution or out-of-distribution.
The key innovation is that LAPT learns the prompts in a label-driven manner, leveraging labeled in-distribution and out-of-distribution data to optimize the prompts for OOD detection.

Plain English Explanation

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models is a new technique that aims to help AI systems better identify when they are shown an image that is very different from the ones they were trained on. This is an important problem, as AI models can sometimes make mistakes or give unreliable results when shown something unexpected.

The core idea behind LAPT is to automatically adjust the "prompts" - the instructions or guidance given to the AI model - to make it better at detecting when an image is out-of-distribution (OOD), meaning it doesn't match the type of images the model was trained on. By learning the optimal prompts in a data-driven way, using examples of both in-distribution and OOD images, the LAPT method can tailor the prompts to improve the model's OOD detection performance.

This is a clever approach that builds on recent advances in prompt learning and zero-shot OOD detection techniques. Rather than relying on hand-crafted prompts or trial-and-error, LAPT automatically optimizes the prompts in a way that is specifically targeted at improving the model's ability to accurately identify when it is being shown something unexpected.

Technical Explanation

The LAPT method works by learning prompt embeddings that are optimized for OOD detection using a label-driven approach. The authors first fine-tune a pre-trained vision-language model (e.g., CLIP) on a set of in-distribution and OOD training images. They then introduce a prompt encoder module that maps natural language prompts to prompt embeddings, which are used to condition the vision-language model's classification.

The key innovation is that the prompt encoder is trained end-to-end along with the vision-language model, using a loss function that encourages the prompts to enhance the model's ability to distinguish in-distribution and OOD inputs. This label-driven prompt tuning allows LAPT to automatically discover prompts that are well-suited for the OOD detection task, without requiring manual prompt engineering.

The authors evaluate LAPT on a range of OOD detection benchmarks, including CIFAR-10/100, ImageNet, and Places365. They show that LAPT outperforms prior state-of-the-art methods for OOD detection, demonstrating the effectiveness of the automated, label-driven prompt tuning approach.

Critical Analysis

One potential limitation of the LAPT approach is that it relies on having access to labeled in-distribution and OOD training data, which may not always be readily available. The authors acknowledge this and suggest that future work could explore ways to leverage unlabeled or weakly labeled data to further improve the prompt tuning process.

Additionally, while LAPT demonstrates strong performance on the evaluated OOD detection benchmarks, it would be valuable to see how the method generalizes to a wider range of OOD scenarios, such as detecting novel object classes or handling significant domain shifts. Expanding the evaluation to a more diverse set of OOD detection tasks could provide additional insights into the strengths and limitations of the LAPT approach.

Overall, the LAPT method represents a promising step forward in the field of OOD detection, leveraging the power of vision-language models and automated prompt tuning to enhance the robustness of AI systems. As the authors note, continued research in this area could lead to important advancements in the safety and reliability of AI-based applications.

Conclusion

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models introduces a novel approach to improving out-of-distribution (OOD) detection using vision-language models. By automatically tuning the prompts used to guide the model's classification, LAPT can enhance the model's ability to accurately identify when it is being shown something that differs significantly from its training data.

The key innovation of LAPT is its label-driven prompt tuning, which leverages examples of both in-distribution and OOD data to optimize the prompts for the OOD detection task. This data-driven approach allows LAPT to discover prompts that are well-suited for the problem at hand, without the need for manual prompt engineering.

The results demonstrate that LAPT outperforms prior state-of-the-art methods for OOD detection, highlighting the potential of this approach to improve the reliability and safety of AI systems. As the field of OOD detection continues to evolve, techniques like LAPT will play an increasingly important role in ensuring that AI models can robustly handle a wide range of inputs, including those that are unexpected or unfamiliar.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Yabin Zhang, Wenjie Zhu, Chenhang He, Lei Zhang

Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at url{https://github.com/YBZh/LAPT}.

7/15/2024

Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du

Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detection. However, the more challenging task of few-shot near OOD detection has not yet been addressed. In this study, we investigate the near OOD detection capabilities of prompt learning models and observe that commonly used OOD scores have limited performance in near OOD detection. To enhance the performance, we propose a fast and simple post-hoc method that complements existing logit-based scores, improving near OOD detection AUROC by up to 11.67% with minimal computational cost. Our method can be easily applied to any prompt learning model without change in architecture or re-training the models. Comprehensive empirical evaluations across 13 datasets and 8 models demonstrate the effectiveness and adaptability of our method.

5/28/2024

Enhancing Outlier Knowledge for Few-Shot Out-of-Distribution Detection with Extensible Local Prompts

Fanhu Zeng, Zhen Cheng, Fei Zhu, Xu-Yao Zhang

Out-of-Distribution (OOD) detection, aiming to distinguish outliers from known categories, has gained prominence in practical scenarios. Recently, the advent of vision-language models (VLM) has heightened interest in enhancing OOD detection for VLM through few-shot tuning. However, existing methods mainly focus on optimizing global prompts, ignoring refined utilization of local information with regard to outliers. Motivated by this, we freeze global prompts and introduce a novel coarse-to-fine tuning paradigm to emphasize regional enhancement with local prompts. Our method comprises two integral components: global prompt guided negative augmentation and local prompt enhanced regional regularization. The former utilizes frozen, coarse global prompts as guiding cues to incorporate negative augmentation, thereby leveraging local outlier knowledge. The latter employs trainable local prompts and a regional regularization to capture local information effectively, aiding in outlier identification. We also propose regional-related metric to empower the enrichment of OOD detection. Moreover, since our approach explores enhancing local prompts only, it can be seamlessly integrated with trained global prompts during inference to boost the performance. Comprehensive experiments demonstrate the effectiveness and potential of our method. Notably, our method reduces average FPR95 by 5.17% against state-of-the-art method in 4-shot tuning on challenging ImageNet-1k dataset, even outperforming 16-shot results of previous methods.

9/10/2024

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

Choubo Ding, Guansong Pang

As vision-language models like CLIP are widely applied to zero-shot tasks and gain remarkable performance on in-distribution (ID) data, detecting and rejecting out-of-distribution (OOD) inputs in the zero-shot setting have become crucial for ensuring the safety of using such models on the fly. Most existing zero-shot OOD detectors rely on ID class label-based prompts to guide CLIP in classifying ID images and rejecting OOD images. In this work we instead propose to leverage a large set of diverse auxiliary outlier class labels as pseudo OOD class text prompts to CLIP for enhancing zero-shot OOD detection, an approach we called Outlier Label Exposure (OLE). The key intuition is that ID images are expected to have lower similarity to these outlier class prompts than OOD images. One issue is that raw class labels often include noise labels, e.g., synonyms of ID labels, rendering raw OLE-based detection ineffective. To address this issue, we introduce an outlier prototype learning module that utilizes the prompt embeddings of the outlier labels to learn a small set of pivotal outlier prototypes for an embedding similarity-based OOD scoring. Additionally, the outlier classes and their prototypes can be loosely coupled with the ID classes, leading to an inseparable decision region between them. Thus, we also introduce an outlier label generation module that synthesizes our outlier prototypes and ID class embeddings to generate in-between outlier prototypes to further calibrate the detection in OLE. Despite its simplicity, extensive experiments show that OLE substantially improves detection performance and achieves new state-of-the-art performance in large-scale OOD and hard OOD detection benchmarks.

6/4/2024