High-Performance Inference Graph Convolutional Networks for Skeleton-Based Action Recognition

Read original: arXiv:2305.18710 - Published 6/19/2024 by Ziao Li, Junyi Wang, Bangli Liu, Haibin Cai, Mohamad Saada, Qinggang Meng

🤯

Overview

Researchers have made significant progress in skeleton-based human action recognition using graph convolutional networks (GCNs)
Current state-of-the-art (SOTA) models focus on complex higher-order connections between joint nodes, leading to slow inference speeds
This paper introduces re-parameterization and over-parameterization techniques to GCNs to address the slow inference issue
Two novel high-performance inference GCN models, HPI-GCN-RP and HPI-GCN-OP, are proposed

Plain English Explanation

Skeleton-based human action recognition is the process of identifying what action a person is performing based on the positions of their skeletal joints. Graph convolutional networks (GCNs) have become very effective at this task, as they can capture the relationships between different joints.

However, the most advanced GCN models today tend to be quite complex, with many intricate connections between the joints. While this complexity helps the models perform well during training, it also makes them slow to use in real-time applications. The researchers in this paper wanted to find a way to maintain the high performance of these complex models while also making them faster to run.

To do this, they used two techniques called re-parameterization and over-parameterization. Re-parameterization allows them to transform the complex training model into a simpler, faster model for inference, without losing much accuracy. Over-parameterization introduces additional parameters to the inference model, which can boost performance even further, though at the cost of some speed.

The researchers tested their new models, called HPI-GCN-RP and HPI-GCN-OP, on standard skeleton-based action recognition datasets. They found that HPI-GCN-OP, the over-parameterized model, achieves performance comparable to the current state-of-the-art approaches, but runs about 5 times faster. This allows it to be used in real-time applications where speed is important, without sacrificing accuracy.

Technical Explanation

The paper focuses on improving the inference speed of state-of-the-art (SOTA) GCN models for skeleton-based human action recognition. These SOTA models typically use complex higher-order connections between joint nodes to capture detailed skeletal information, but this comes at the cost of slower inference speeds.

To address this, the authors introduce re-parameterization and over-parameterization techniques to GCNs. They propose two novel models:

HPI-GCN-RP: This model uses re-parameterization to transform the high-performance training model into a faster inference model through linear transformations, without significant loss in performance.
HPI-GCN-OP: This model further utilizes over-parameterization to achieve higher performance improvement by introducing additional inference parameters, although this results in a slight decrease in inference speed compared to HPI-GCN-RP.

The re-parameterization and over-parameterization techniques are applied after the model training process is complete, so the model parameters are fixed during inference.

The authors evaluate their models on two popular skeleton-based action recognition datasets, NTU-RGB+D 60 and NTU-RGB+D 120. They show that their HPI-GCN-OP model achieves performance comparable to current SOTA models, with an inference speed that is 5 times faster. Specifically, HPI-GCN-OP achieves 93% accuracy on the NTU-RGB+D 60 dataset and 90.1% on the NTU-RGB+D 120 dataset.

Critical Analysis

The paper presents a novel approach to improving the inference speed of GCN-based models for skeleton-based action recognition, without significantly compromising their performance. The use of re-parameterization and over-parameterization techniques is an interesting and effective solution to the trade-off between model complexity and inference speed.

However, the paper does not provide a detailed analysis of the limitations of the proposed methods. For instance, it is unclear how the re-parameterization and over-parameterization techniques might perform on larger or more complex datasets, or how sensitive the models are to changes in the input data or network architecture.

Additionally, the paper does not explore the potential for further optimizations or the integration of the proposed models with other GCN-based approaches, which could lead to even greater performance improvements. Further research is needed to understand the broader implications and applications of these techniques.

Overall, the paper presents a promising approach to improving the practicality of GCN-based models for real-world skeleton-based action recognition tasks, but additional exploration and validation would help solidify the significance and impact of the research.

Conclusion

This paper introduces two novel GCN-based models, HPI-GCN-RP and HPI-GCN-OP, that leverage re-parameterization and over-parameterization techniques to achieve faster inference speeds while maintaining high performance on skeleton-based human action recognition tasks. The experimental results demonstrate the effectiveness of these approaches, with HPI-GCN-OP achieving state-of-the-art accuracy on benchmark datasets while running 5 times faster than current SOTA models.

The ability to balance model complexity and inference speed is crucial for deploying GCN-based action recognition systems in real-time applications. The techniques presented in this paper represent an important step towards making these advanced AI models more practical and accessible for a wider range of use cases.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

High-Performance Inference Graph Convolutional Networks for Skeleton-Based Action Recognition

Ziao Li, Junyi Wang, Bangli Liu, Haibin Cai, Mohamad Saada, Qinggang Meng

Recently, the significant achievements have been made in skeleton-based human action recognition with the emergence of graph convolutional networks (GCNs). However, the state-of-the-art (SOTA) models used for this task focus on constructing more complex higher-order connections between joint nodes to describe skeleton information, which leads to complex inference processes and high computational costs. To address the slow inference speed caused by overly complex model structures, we introduce re-parameterization and over-parameterization techniques to GCNs and propose two novel high-performance inference GCNs, namely HPI-GCN-RP and HPI-GCN-OP. After the completion of model training, model parameters are fixed. HPI-GCN-RP adopts re-parameterization technique to transform high-performance training model into fast inference model through linear transformations, which achieves a higher inference speed with competitive model performance. HPI-GCN-OP further utilizes over-parameterization technique to achieve higher performance improvement by introducing additional inference parameters, albeit with slightly decreased inference speed. The experimental results on the two skeleton-based action recognition datasets demonstrate the effectiveness of our approach. Our HPI-GCN-OP achieves performance comparable to the current SOTA models, with inference speeds five times faster. Specifically, our HPI-GCN-OP achieves an accuracy of 93% on the cross-subject split of the NTU-RGB+D 60 dataset, and 90.1% on the cross-subject benchmark of the NTU-RGB+D 120 dataset. Code is available at github.com/lizaowo/HPI-GCN.

6/19/2024

🌐

An Improved Graph Pooling Network for Skeleton-Based Action Recognition

Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler

Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy.

4/26/2024

Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution

Jingyao Wang, Emmanuel Bergeret, Issam Falih

Human Activity Recognition (HAR) is a field of study that focuses on identifying and classifying human activities. Skeleton-based Human Activity Recognition has received much attention in recent years, where Graph Convolutional Network (GCN) based method is widely used and has achieved remarkable results. However, the representation of skeleton data and the issue of over-smoothing in GCN still need to be studied. 1). Compared to central nodes, edge nodes can only aggregate limited neighbor information, and different edge nodes of the human body are always structurally related. However, the information from edge nodes is crucial for fine-grained activity recognition. 2). The Graph Convolutional Network suffers from a significant over-smoothing issue, causing nodes to become increasingly similar as the number of network layers increases. Based on these two ideas, we propose a two-stream graph convolution method called Spatial-Structural GCN (SpSt-GCN). Spatial GCN performs information aggregation based on the topological structure of the human body, and structural GCN performs differentiation based on the similarity of edge node sequences. The spatial connection is fixed, and the human skeleton naturally maintains this topology regardless of the actions performed by humans. However, the structural connection is dynamic and depends on the type of movement the human body is performing. Based on this idea, we also propose an entirely data-driven structural connection, which greatly increases flexibility. We evaluate our method on two large-scale datasets, i.e., NTU RGB+D and NTU RGB+D 120. The proposed method achieves good results while being efficient.

8/1/2024

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

Lei Jiang, Weixin Yang, Xin Zhang, Hao Ni

Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art (SOTA) models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the G-Dev layer, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. By integrating the G-Dev layer, the hybrid G-DevLSTM module enhances the traditional LSTM to reduce the time dimension while retaining high-frequency information. It can be conveniently applied to any temporal graph data, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.

5/28/2024