Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

Read original: arXiv:2404.17381 - Published 4/29/2024 by Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

❗

Overview

The paper introduces a new task called "Human Action Anomaly Detection" (HAAD), which aims to identify unusual human behaviors in an unsupervised manner.
Unlike previous anomaly detection tasks that focus on unusual events in videos, HAAD involves learning specific action labels to recognize semantically anomalous human behaviors.
The authors propose a Normalizing Flow (NF)-based detection framework that leverages sample likelihood to indicate anomalies.
The framework incorporates extra encoding streams to model body subsets, enabling the discovery of both global and local motion anomalies.
The authors also use Discrete Cosine Transformation to convert action samples from the temporal to the frequency domain, mitigating the issue of data instability.

Plain English Explanation

The paper focuses on a new problem called "Human Action Anomaly Detection" (HAAD). This task is about identifying unusual or abnormal human behaviors in a video, without having any information about what the normal behaviors are.

Previous work on anomaly detection in videos has mostly looked at detecting unusual events, like a car accident or a fight. But HAAD is different - it's specifically about recognizing when someone is performing an action in an unusual or unexpected way, even if the action itself is not unusual.

To address this task, the researchers developed a machine learning model based on a technique called "Normalizing Flow". This model learns what normal human actions look like, and then uses that knowledge to spot when an action is significantly different from the norm.

The model has some additional tricks up its sleeve. It looks not just at the overall body movement, but also at the motion of individual body parts. This helps it catch subtle anomalies that might be missed if you just look at the whole body.

The researchers also used a mathematical technique called Discrete Cosine Transformation to smooth out some of the choppiness or jitter that can occur in video data. This makes it easier for the model to accurately identify unusual movements.

Overall, this research aims to push the boundaries of what's possible in terms of automatically detecting abnormal human behaviors, which could have applications in areas like surveillance, healthcare, and human-robot interaction.

Technical Explanation

The authors propose a novel task called "Human Action Anomaly Detection" (HAAD), which goes beyond detecting unusual events in videos to focus on identifying semantically anomalous human behaviors. To address this task, they develop a Normalizing Flow-based detection framework that leverages sample likelihood to indicate anomalies.

Recognizing that action anomalies often occur in specific body parts, the authors incorporate extra encoding streams into their framework to model body subsets in addition to the full-body action feature learning. This enables their model to jointly discover global and local motion anomalies.

Furthermore, to handle the potentially jittery nature of the video data, the authors resort to Discrete Cosine Transformation. This converts the action samples from the temporal domain to the frequency domain, mitigating the issue of data instability.

The authors evaluate their framework on two human action datasets and demonstrate that it outperforms baselines adapted from state-of-the-art human activity anomaly detection approaches.

Critical Analysis

The paper presents a novel and promising approach to the task of HAAD, but it also raises some potential concerns and areas for further research.

One limitation is the reliance on pre-determined normal action categories for training. In real-world scenarios, the set of "normal" actions may not be well-defined or easily obtainable. The authors do not address how their framework would handle more open-ended or unconstrained environments.

Additionally, the paper does not provide a detailed analysis of the types of anomalies the model is able to detect. It would be helpful to understand the specific behavioral patterns or subtleties that the model is particularly adept at identifying, as well as any biases or blind spots.

The authors also do not discuss the potential for their framework to be used in sensitive applications, such as surveillance or healthcare monitoring, and the associated ethical considerations around privacy and fairness. Addressing these concerns would be an important next step.

Finally, while the Discrete Cosine Transformation technique helps mitigate issues with data instability, it is unclear how the model would perform on more diverse or challenging video data, such as low-resolution, occlusions, or significant camera motion. Further testing on a broader range of scenarios would help validate the robustness of the approach.

Conclusion

The paper introduces an innovative task and framework for "Human Action Anomaly Detection," which aims to identify unusual human behaviors in an unsupervised manner. The proposed Normalizing Flow-based model, with its multi-level structure and frequency domain data processing, demonstrates promising results in detecting both global and local motion anomalies.

While the research represents an important step forward, there are still open questions and areas for improvement, such as handling more unconstrained environments, addressing ethical considerations, and validating the approach on diverse video data. By continuing to explore these challenges, the authors and the broader research community can further advance the capabilities of anomaly detection systems to better understand and model complex human behaviors.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

❗

Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalous human behaviors. To address this task, we propose a normalizing flow (NF)-based detection framework where the sample likelihood is effectively leveraged to indicate anomalies. As action anomalies often occur in some specific body parts, in addition to the full-body action feature learning, we incorporate extra encoding streams into our framework for a finer modeling of body subsets. Our framework is thus multi-level to jointly discover global and local motion anomalies. Furthermore, to show awareness of the potentially jittery data during recording, we resort to discrete cosine transformation by converting the action samples from the temporal to the frequency domain to mitigate the issue of data instability. Extensive experimental results on two human action datasets demonstrate that our method outperforms the baselines formed by adapting state-of-the-art human activity AD approaches to our task of HAAD.

4/29/2024

Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection

Xincheng Yao, Ruoqi Li, Zefeng Qian, Lu Wang, Chongyang Zhang

Unified anomaly detection (AD) is one of the most challenges for anomaly detection, where one unified model is trained with normal samples from multiple classes with the objective to detect anomalies in these classes. For such a challenging task, popular normalizing flow (NF) based AD methods may fall into a homogeneous mapping issue,where the NF-based AD models are biased to generate similar latent representations for both normal and abnormal features, and thereby lead to a high missing rate of anomalies. In this paper, we propose a novel Hierarchical Gaussian mixture normalizing flow modeling method for accomplishing unified Anomaly Detection, which we call HGAD. Our HGAD consists of two key components: inter-class Gaussian mixture modeling and intra-class mixed class centers learning. Compared to the previous NF-based AD methods, the hierarchical Gaussian mixture modeling approach can bring stronger representation capability to the latent space of normalizing flows, so that even complex multi-class distribution can be well represented and learned in the latent space. In this way, we can avoid mapping different class distributions into the same single Gaussian prior, thus effectively avoiding or mitigating the homogeneous mapping issue. We further indicate that the more distinguishable different class centers, the more conducive to avoiding the bias issue. Thus, we further propose a mutual information maximization loss for better structuring the latent feature space. We evaluate our method on four real-world AD benchmarks, where we can significantly improve the previous NF-based AD methods and also outperform the SOTA unified AD methods.

7/8/2024

🌐

A brief introduction to a framework named Multilevel Guidance-Exploration Network

Guoqing Yang, Zhiming Luo, Jianzhe Gao, Yingxin Lai, Kun Yang, Yifan He, Shaozi Li

Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network(MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the pre-trained Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore motion latent features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module(BSMM) to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets.

6/11/2024

Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection

Chenglizhao Chen, Xinyu Liu, Mengke Song, Luming Li, Xu Yu, Shanchen Pang

Detecting anomalies in human-related videos is crucial for surveillance applications. Current methods primarily include appearance-based and action-based techniques. Appearance-based methods rely on low-level visual features such as color, texture, and shape. They learn a large number of pixel patterns and features related to known scenes during training, making them effective in detecting anomalies within these familiar contexts. However, when encountering new or significantly changed scenes, i.e., unknown scenes, they often fail because existing SOTA methods do not effectively capture the relationship between actions and their surrounding scenes, resulting in low generalization. In contrast, action-based methods focus on detecting anomalies in human actions but are usually less informative because they tend to overlook the relationship between actions and their scenes, leading to incorrect detection. For instance, the normal event of running on the beach and the abnormal event of running on the street might both be considered normal due to the lack of scene information. In short, current methods struggle to integrate low-level visual and high-level action features, leading to poor anomaly detection in varied and complex scenes. To address this challenge, we propose a novel decoupling-based architecture for human-related video anomaly detection (DecoAD). DecoAD significantly improves the integration of visual and action features through the decoupling and interweaving of scenes and actions, thereby enabling a more intuitive and accurate understanding of complex behaviors and scenes. DecoAD supports fully supervised, weakly supervised, and unsupervised settings.

9/6/2024