CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition

Read original: arXiv:2407.03632 - Published 7/8/2024 by Huanzhang Dou, Pengyi Zhang, Yuhan Zhao, Lu Jin, Xi Li

CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition

Overview

The paper introduces CLASH, a novel complementary learning approach combined with neural architecture search (NAS) for gait recognition.
Gait recognition is the task of identifying individuals based on their walking patterns, which can be useful for various applications like surveillance and healthcare.
CLASH leverages the complementary strengths of different modalities, such as camera and LiDAR sensors, to improve gait recognition performance.
The neural architecture search component optimizes the network architecture for the specific task and data, leading to improved accuracy.

Plain English Explanation

The researchers developed a new method called CLASH (Complementary Learning with Neural Architecture Search) to improve the accuracy of gait recognition. Gait recognition is the ability to identify people based on the way they walk, which can be useful for things like security and healthcare.

CLASH works by combining information from different types of sensors, like cameras and LiDAR (a technology that uses lasers to measure distances). These sensors provide complementary information about a person's walking pattern, which helps the system make more accurate identifications.

The researchers also used a technique called neural architecture search (NAS) to automatically design the best neural network architecture for the gait recognition task and the specific data being used. This allows the system to be fine-tuned and optimized, leading to even better performance.

By leveraging the strengths of multiple sensors and optimizing the neural network, CLASH is able to achieve higher accuracy in identifying people based on how they walk compared to previous methods.

Technical Explanation

The paper introduces a novel approach called CLASH (Complementary Learning with Neural Architecture Search) for gait recognition. Gait recognition is the task of identifying individuals based on their walking patterns, which can be useful for applications like surveillance and healthcare.

CLASH leverages the complementary information provided by different sensing modalities, such as camera and LiDAR, to improve gait recognition performance. The researchers hypothesize that fusing the complementary data from these modalities can lead to more robust and discriminative feature representations.

To optimize the neural network architecture for the specific gait recognition task and dataset, CLASH employs a neural architecture search (NAS) component. NAS automatically searches for the best-performing network design, going beyond manually designed architectures. This allows the system to be tailored to the problem at hand, leading to enhanced accuracy.

The proposed CLASH framework consists of three key components:

Complementary Learning: CLASH fuses the features extracted from camera and LiDAR data using an asymmetric fusion module. This allows the system to learn complementary representations that capture richer gait information compared to using a single modality.
Neural Architecture Search: The researchers leverage a NAS algorithm to automatically discover an optimal network architecture for the gait recognition task. This includes searching for the best layer types, connections, and hyperparameters.
Multi-Task Learning: CLASH employs a multi-task learning strategy, jointly optimizing for gait recognition and other auxiliary tasks, such as body part segmentation. This allows the model to learn more generalizable and discriminative features.

The paper evaluates CLASH on several gait recognition benchmarks and demonstrates its superior performance compared to state-of-the-art methods. The results highlight the benefits of complementary learning and the effectiveness of the NAS component in optimizing the network architecture for the task at hand.

Critical Analysis

The paper presents a compelling approach to gait recognition by leveraging complementary learning and neural architecture search. Some key strengths of the research include:

Multimodal Fusion: Combining camera and LiDAR data allows CLASH to capture richer gait characteristics, leading to improved recognition performance compared to unimodal approaches.
Neural Architecture Search: The use of NAS to automatically optimize the network architecture is a significant advantage, as it enables the system to be tailored to the specific problem and dataset.
Multi-Task Learning: The inclusion of auxiliary tasks, such as body part segmentation, helps the model learn more generalizable and discriminative features.

However, the paper also has some potential limitations:

Computational Complexity: The NAS component can be computationally intensive, which may limit the scalability of the approach, especially for real-time applications.
Sensor Availability: Relying on both camera and LiDAR sensors may not be feasible in all scenarios, as the availability and cost of these sensors can vary.
Real-World Robustness: The evaluation is primarily conducted on controlled datasets, and the performance in unconstrained, real-world environments with varying lighting, occlusions, and other challenges is not extensively explored.

Future research could focus on addressing these limitations, such as investigating more efficient NAS algorithms, exploring single-sensor gait recognition approaches, and evaluating the method's robustness in diverse real-world scenarios.

Conclusion

The CLASH framework presented in this paper represents a significant advancement in the field of gait recognition. By combining complementary learning from camera and LiDAR data with neural architecture search, the researchers have developed a highly effective system for accurately identifying individuals based on their walking patterns.

The key contributions of this work include the innovative fusion of multimodal sensor data, the use of NAS to optimize the network architecture, and the multi-task learning approach. These elements work together to push the boundaries of gait recognition performance, with potential applications in areas like surveillance, healthcare, and human-computer interaction.

While the paper highlights the strengths of the CLASH approach, further research is needed to address the computational complexity, sensor availability, and real-world robustness challenges. Nonetheless, this work stands as an important step forward in the development of advanced gait recognition systems that can be deployed in a wide range of practical scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition

Huanzhang Dou, Pengyi Zhang, Yuhan Zhao, Lu Jin, Xi Li

Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.

7/8/2024

GaitMA: Pose-guided Multi-modal Feature Fusion for Gait Recognition

Fanxu Min, Shaoxiang Guo, Fan Hao, Junyu Dong

Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns. Existing appearance-based methods utilize CNN or Transformer to extract spatial and temporal features from silhouettes, while model-based methods employ GCN to focus on the special topological structure of skeleton points. However, the quality of silhouettes is limited by complex occlusions, and skeletons lack dense semantic features of the human body. To tackle these problems, we propose a novel gait recognition framework, dubbed Gait Multi-model Aggregation Network (GaitMA), which effectively combines two modalities to obtain a more robust and comprehensive gait representation for recognition. First, skeletons are represented by joint/limb-based heatmaps, and features from silhouettes and skeletons are respectively extracted using two CNN-based feature extractors. Second, a co-attention alignment module is proposed to align the features by element-wise attention. Finally, we propose a mutual learning module, which achieves feature fusion through cross-attention, Wasserstein loss is further introduced to ensure the effective fusion of two modalities. Extensive experimental results demonstrate the superiority of our model on Gait3D, OU-MVLP, and CASIA-B.

7/23/2024

Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu

Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.

7/18/2024

GLGait: A Global-Local Temporal Receptive Field Network for Gait Recognition in the Wild

Guozhen Peng, Yunhong Wang, Yuwei Zhao, Shaoxiong Zhang, Annan Li

Gait recognition has attracted increasing attention from academia and industry as a human recognition technology from a distance in non-intrusive ways without requiring cooperation. Although advanced methods have achieved impressive success in lab scenarios, most of them perform poorly in the wild. Recently, some Convolution Neural Networks (ConvNets) based methods have been proposed to address the issue of gait recognition in the wild. However, the temporal receptive field obtained by convolution operations is limited for long gait sequences. If directly replacing convolution blocks with visual transformer blocks, the model may not enhance a local temporal receptive field, which is important for covering a complete gait cycle. To address this issue, we design a Global-Local Temporal Receptive Field Network (GLGait). GLGait employs a Global-Local Temporal Module (GLTM) to establish a global-local temporal receptive field, which mainly consists of a Pseudo Global Temporal Self-Attention (PGTA) and a temporal convolution operation. Specifically, PGTA is used to obtain a pseudo global temporal receptive field with less memory and computation complexity compared with a multi-head self-attention (MHSA). The temporal convolution operation is used to enhance the local temporal receptive field. Besides, it can also aggregate pseudo global temporal receptive field to a true holistic temporal receptive field. Furthermore, we also propose a Center-Augmented Triplet Loss (CTL) in GLGait to reduce the intra-class distance and expand the positive samples in the training stage. Extensive experiments show that our method obtains state-of-the-art results on in-the-wild datasets, $i.e.$, Gait3D and GREW. The code is available at https://github.com/bgdpgz/GLGait.

8/14/2024