EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

2405.12502

Published 7/2/2024 by Yihong Huang, Yuang Zhang, Liping Wang, Fan Zhang, Xuemin Lin

🤷

Abstract

Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 2% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.

Create account to get full access

Overview

Unsupervised Outlier Detection (UOD) is an important data mining task.
Deep learning has led to the rise of deep Outlier Detection (OD) models.
Most deep UOD models are trained on clean datasets, which can be difficult to obtain in real-world scenarios.
Approaches that train on unlabeled contaminated datasets require methods robust to such conditions.
Ensemble methods have shown promise in enhancing model robustness, but increase training time.

Plain English Explanation

Identifying unusual or anomalous data points, known as outlier detection, is a crucial task in data analysis. With the advancements in deep learning, researchers have developed deep learning-based outlier detection models. <a href="https://aimodels.fyi/papers/arxiv/gradient-regularized-out-distribution-detection">These models are typically trained on clean, well-curated datasets</a> to learn the patterns of normal data. However, obtaining clean datasets in real-world scenarios can be challenging and time-consuming.

To address this issue, some researchers have explored training and detecting outliers directly on unlabeled, contaminated datasets. This approach presents the need for methods that can handle such noisy or imperfect data. <a href="https://aimodels.fyi/papers/arxiv/deep-metric-learning-based-out-distribution-detection">Ensemble methods, where multiple models are combined, have emerged as a promising solution to improve the robustness of outlier detection models against contaminated training data</a>. However, this approach significantly increases the training time.

In this study, the researchers aimed to investigate the impact of outliers on the training process and develop a way to stop training before performance degradation occurs, even when working with unlabeled, contaminated datasets.

Technical Explanation

The researchers initially observed that blending normal and anomalous data during training can cause fluctuations in the Area Under the Curve (AUC), a commonly used metric to measure the accuracy of outlier detection models. To circumvent the need for labeled data, the researchers propose a novel metric called "Loss Entropy," which measures the distribution of the model's loss values. <a href="https://aimodels.fyi/papers/arxiv/noisy-elephant-room-is-your-out-distribution">They theoretically demonstrate a negative correlation between the Loss Entropy metric and the label-based AUC</a>, allowing them to infer the optimal stopping point for training without requiring labeled data.

Based on this insight, the researchers develop an automated early-stopping algorithm called "EntropyStop," which halts the training process when the Loss Entropy suggests the model has reached its maximum detection capability. They conduct extensive experiments on the ADBench dataset, which includes 47 real-world datasets, and find that an AutoEncoder (AE) model enhanced by their approach not only outperforms ensemble AEs but also requires less than 1% of the training time.

<a href="https://aimodels.fyi/papers/arxiv/energy-based-hopfield-boosting-out-distribution-detection">The researchers also evaluate their proposed metric and early-stopping approach on other deep outlier detection models, demonstrating their broad applicability</a>.

Critical Analysis

The researchers have addressed an important challenge in the field of unsupervised outlier detection by developing a method that can effectively train deep learning models on unlabeled, contaminated datasets. The proposed Loss Entropy metric and EntropyStop algorithm provide a practical solution to the problem of performance degradation due to the presence of outliers in the training data.

One potential limitation of the study is that it primarily focuses on the AutoEncoder model, and it would be valuable to see how the approach performs on a wider range of deep outlier detection architectures. Additionally, the researchers mention that their method is broadly applicable, but further exploration of its performance on different types of datasets and real-world scenarios would strengthen the claims.

<a href="https://aimodels.fyi/papers/arxiv/toward-realistic-benchmark-out-distribution-detection">As the researchers note, the development of realistic and comprehensive benchmark datasets for out-of-distribution detection is an important area for future research</a>. This would help to further validate the effectiveness of the proposed approach and enable more robust comparisons with other methods.

Conclusion

This study presents a novel approach to training deep outlier detection models on unlabeled, contaminated datasets. By introducing the Loss Entropy metric and the EntropyStop algorithm, the researchers have developed a practical solution to the problem of performance degradation due to the presence of outliers in the training data. The results demonstrate the effectiveness of their approach, which not only outperforms ensemble methods but also significantly reduces the training time required.

The broader implications of this work include the potential to enable more widespread adoption of deep outlier detection techniques in real-world applications, where obtaining clean datasets can be challenging. The proposed methods may also inspire further research into unsupervised anomaly detection and the development of more robust and efficient deep learning models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Continual Unsupervised Out-of-Distribution Detection

Lars Doorenbos, Raphael Sznitman, Pablo M'arquez-Neila

Deep learning models excel when the data distribution during training aligns with testing data. Yet, their performance diminishes when faced with out-of-distribution (OOD) samples, leading to great interest in the field of OOD detection. Current approaches typically assume that OOD samples originate from an unconcentrated distribution complementary to the training distribution. While this assumption is appropriate in the traditional unsupervised OOD (U-OOD) setting, it proves inadequate when considering the place of deployment of the underlying deep learning model. To better reflect this real-world scenario, we introduce the novel setting of continual U-OOD detection. To tackle this new setting, we propose a method that starts from a U-OOD detector, which is agnostic to the OOD distribution, and slowly updates during deployment to account for the actual OOD distribution. Our method uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach. Furthermore, we design a confidence-scaled few-shot OOD detector that outperforms previous methods. We show our method greatly improves upon strong baselines from related fields.

6/5/2024

cs.CV cs.LG

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang

Uncertainty estimation is crucial for machine learning models to detect out-of-distribution (OOD) inputs. However, the conventional discriminative deep learning classifiers produce uncalibrated closed-set predictions for OOD data. A more robust classifiers with the uncertainty estimation typically require a potentially unavailable OOD dataset for outlier exposure training, or a considerable amount of additional memory and compute to build ensemble models. In this work, we improve on uncertainty estimation without extra OOD data or additional inference costs using an alternative Split-Ensemble method. Specifically, we propose a novel subtask-splitting ensemble training objective, where a common multiclass classification task is split into several complementary subtasks. Then, each subtask's training data can be considered as OOD to the other subtasks. Diverse submodels can therefore be trained on each subtask with OOD-aware objectives. The subtask-splitting objective enables us to share low-level features across submodels to avoid parameter and computational overheads. In particular, we build a tree-like Split-Ensemble architecture by performing iterative splitting and pruning from a shared backbone model, where each branch serves as a submodel corresponding to a subtask. This leads to improved accuracy and uncertainty estimation across submodels under a fixed ensemble computation budget. Empirical study with ResNet-18 backbone shows Split-Ensemble, without additional computation cost, improves accuracy over a single model by 0.8%, 1.8%, and 25.5% on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively. OOD detection for the same backbone and in-distribution datasets surpasses a single model baseline by, correspondingly, 2.2%, 8.1%, and 29.6% mean AUROC.

5/28/2024

cs.LG cs.CV

🔮

Out-of-distribution Reject Option Method for Dataset Shift Problem in Early Disease Onset Prediction

Taisei Tosaki, Eiichiro Uchino, Ryosuke Kojima, Yohei Mineharu, Mikio Arita, Nobuyuki Miyai, Yoshinori Tamada, Tatsuya Mikami, Koichi Murashita, Shigeyuki Nakaji, Yasushi Okuno

Machine learning is increasingly used to predict lifestyle-related disease onset using health and medical data. However, the prediction effectiveness is hindered by dataset shift, which involves discrepancies in data distribution between the training and testing datasets, misclassifying out-of-distribution (OOD) data. To diminish dataset shift effects, this paper proposes the out-of-distribution reject option for prediction (ODROP), which integrates OOD detection models to preclude OOD data from the prediction phase. We investigated the efficacy of five OOD detection methods (variational autoencoder, neural network ensemble std, neural network ensemble epistemic, neural network energy, and neural network gaussian mixture based energy measurement) across two datasets, the Hirosaki and Wakayama health checkup data, in the context of three disease onset prediction tasks: diabetes, dyslipidemia, and hypertension. To evaluate the ODROP method, we trained disease onset prediction models and OOD detection models on Hirosaki data and used AUROC-rejection curve plots from Wakayama data. The variational autoencoder method showed superior stability and magnitude of improvement in Area Under the Receiver Operating Curve (AUROC) in five cases: AUROC in the Wakayama data was improved from 0.80 to 0.90 at a 31.1% rejection rate for diabetes onset and from 0.70 to 0.76 at a 34% rejection rate for dyslipidemia. We categorized dataset shifts into two types using SHAP clustering - those that considerably affect predictions and those that do not. We expect that this classification will help standardize measuring instruments. This study is the first to apply OOD detection to actual health and medical data, demonstrating its potential to substantially improve the accuracy and reliability of disease prediction models amidst dataset shift.

5/31/2024

cs.LG cs.AI

🤷

New!Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

Xueying Ding, Yue Zhao, Leman Akoglu

Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.

7/2/2024

cs.LG cs.AI