Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification

2404.04120

Published 4/8/2024 by Rui Wang, Chuanfu Shen, Manuel J. Marin-Jimenez, George Q. Huang, Shiqi Yu

🤯

Abstract

Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.

Create account to get full access

Overview

This document provides guidelines for authors on how to format their responses for a particular publication.
The guidelines cover topics such as the length of the response, formatting instructions, and other technical details.
The document is organized into several sections, including an introduction, instructions for formatting the response, and other relevant information.

Plain English Explanation

This technical paper outlines the formatting guidelines that authors should follow when submitting a response to a particular publication. It covers the expected length of the response, as well as specific instructions for how to structure and format the document using LaTeX. The goal is to ensure that all responses adhere to a consistent style and layout, making it easier for the publication to review and process the submissions. While the details may seem technical, the overall purpose is to provide clear and straightforward guidelines for authors to follow when preparing their responses.

Technical Explanation

The document begins with an introduction that provides an overview of the guidelines and their importance. It then delves into the specific requirements for the response length, stating that the response should be no more than a certain number of pages.

The bulk of the document focuses on formatting instructions for the response. This includes guidance on the proper use of LaTeX markup, such as section headings, paragraph structure, and citation formatting. The guidelines also cover the inclusion of figures, tables, and other visual elements.

Additionally, the document touches on other technical considerations, such as the use of fonts, spacing, and file naming conventions. The goal is to ensure that all responses are presented in a consistent and professional manner, making it easier for the publication to process and review the submissions.

Critical Analysis

The guidelines provided in this document are relatively straightforward and comprehensive, covering the key elements that authors need to consider when preparing their responses. However, there may be some limitations or areas for improvement:

The guidelines are specific to the use of LaTeX, which may not be the preferred or most accessible markup language for all authors. It could be beneficial to also provide guidance for other common document formats, such as Microsoft Word or Google Docs.
The guidelines do not address the content or structure of the response itself, beyond the formatting requirements. It may be helpful to provide some high-level guidance on the expected tone, structure, and level of detail that authors should aim for in their responses.
The guidelines could potentially be expanded to cover other relevant topics, such as the submission process, review timelines, or any specific requirements or restrictions imposed by the publication.

Overall, these guidelines appear to be a thorough and well-considered set of instructions for authors, but there may be opportunities to make them even more accessible and comprehensive.

Conclusion

The LATEX Guidelines for Author Response document provides clear and detailed instructions for authors on how to format their responses for a particular publication. By ensuring a consistent and professional presentation across all submissions, the guidelines help to streamline the review and processing of the responses, ultimately contributing to a more efficient and effective publication process.

While the guidelines may seem technical, the underlying purpose is to support authors in effectively communicating their ideas and contributions to the research community. By following these guidelines, authors can focus on the content and substance of their responses, confident that the formatting will meet the publication's requirements.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

Yunze Deng, Haijun Xiong, Bin Feng

Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this paper, we introduce a novel modality-sensitive network LiCAF for LiDAR-camera fusion, which employs an asymmetric modeling strategy. Specifically, we propose Asymmetric Cross-modal Channel Attention (ACCA) and Interlaced Cross-modal Temporal Modeling (ICTM) for cross-modal valuable channel information selection and powerful temporal modeling. Our method achieves state-of-the-art performance (93.9% in Rank-1 and 98.8% in Rank-5) on the SUSTech1K dataset, demonstrating its effectiveness.

6/19/2024

cs.CV

👁️

Gait Recognition in Large-scale Free Environment via Single LiDAR

Xiao Han, Yiming Ren, Peishan Cong, Yujing Sun, Jingya Wang, Lan Xu, Yuexin Ma

Human gait recognition is crucial in multimedia, enabling identification through walking patterns without direct interaction, enhancing the integration across various media forms in real-world applications like smart homes, healthcare and non-intrusive security. LiDAR's ability to capture depth makes it pivotal for robotic perception and holds promise for real-world gait recognition. In this paper, based on a single LiDAR, we present the Hierarchical Multi-representation Feature Interaction Network (HMRNet) for robust gait recognition. Prevailing LiDAR-based gait datasets primarily derive from controlled settings with predefined trajectory, remaining a gap with real-world scenarios. To facilitate LiDAR-based gait recognition research, we introduce FreeGait, a comprehensive gait dataset from large-scale, unconstrained settings, enriched with multi-modal and varied 2D/3D data. Notably, our approach achieves state-of-the-art performance on prior dataset (SUSTech1K) and on FreeGait. Code and dataset will be released upon publication of this paper.

4/29/2024

cs.CV

🖼️

Research on Image Recognition Technology Based on Multimodal Deep Learning

Jinyin Wang, Xingchen Li, Yixuan Jin, Yihao Zhong, Keke Zhang, Chang Zhou

This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. In this project, multiple cameras developed by Microsoft Kinect were used to collect corresponding bone point data based on acquiring conventional images. In this way, the motion features in the image can be extracted. Ultimately, the behavioral characteristics discerned through both approaches are synthesized to facilitate the precise identification and categorization of behaviors. The performance of the suggested algorithm was evaluated using the MSR3D data set. The findings from these experiments indicate that the accuracy in recognizing behaviors remains consistently high, suggesting that the algorithm is reliable in various scenarios. Additionally, the tests demonstrate that the algorithm substantially enhances the accuracy of detecting pedestrian behaviors in video footage.

5/7/2024

cs.CV cs.LG

🤷

Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, Xinbo Gao

Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to match pedestrian images of the same identity from different modalities without annotations. Existing works mainly focus on alleviating the modality gap by aligning instance-level features of the unlabeled samples. However, the relationships between cross-modality clusters are not well explored. To this end, we propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters. Specifically, we design a Many-to-many Bilateral Cross-Modality Cluster Matching (MBCCM) algorithm through optimizing the maximum matching problem in a bipartite graph. Then, the matched pairwise clusters utilize shared visible and infrared pseudo-labels during the model training. Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level. Meanwhile, the cross-modality Consistency Constraint (CC) is proposed to explicitly reduce the large modality discrepancy. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method, surpassing state-of-the-art approaches by a large margin of 8.76% mAP on average.

5/28/2024

cs.CV cs.AI