Unsupervised Visible-Infrared ReID via Pseudo-label Correction and Modality-level Alignment

2404.06683

Published 4/11/2024 by Yexin Liu, Weiming Zhang, Athanasios V. Vasilakos, Lin Wang

Unsupervised Visible-Infrared ReID via Pseudo-label Correction and Modality-level Alignment

Abstract

Unsupervised visible-infrared person re-identification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intra-modality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo labels might be generated in the clustering process, and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from two modalities. In this paper, we first conduct a theoretic analysis where an interpretable generalization upper bound is introduced. Based on the analysis, we then propose a novel unsupervised cross-modality person re-identification framework (PRAISE). Specifically, to address the first challenge, we propose a pseudo-label correction strategy that utilizes a Beta Mixture Model to predict the probability of mis-clustering based network's memory effect and rectifies the correspondence by adding a perceptual term to contrastive learning. Next, we introduce a modality-level alignment strategy that generates paired visible-infrared latent features and reduces the modality gap by aligning the labeling function of visible and infrared features to learn identity discriminative and modality-invariant features. Experimental results on two benchmark datasets demonstrate that our method achieves state-of-the-art performance than the unsupervised visible-ReID methods.

Create account to get full access

Overview

This paper proposes an unsupervised method for person re-identification (ReID) across visible and infrared (IR) modalities.
The key ideas are pseudo-label correction and modality-level alignment, which aim to address the challenges of domain gap and intra-class variation in cross-modal ReID.
The proposed approach outperforms state-of-the-art unsupervised visible-infrared ReID methods on multiple benchmarks.

Plain English Explanation

Person re-identification (ReID) is the task of identifying the same person across different camera views or modalities, such as visible and infrared (IR) images. This can be useful in applications like security and surveillance. However, training ReID models typically requires a lot of labeled data, which can be time-consuming and expensive to collect.

This paper presents an unsupervised approach to visible-infrared ReID, which means it can learn to match people across these modalities without any labeled training data. The key innovations are:

Pseudo-label Correction: The model first generates initial "pseudo-labels" to group similar images together, but then it iteratively refines these pseudo-labels to better match the true underlying structure of the data.
Modality-level Alignment: The model also aligns the feature representations of the visible and IR modalities, reducing the gap between them and allowing better cross-modal matching.

By addressing the challenges of domain gap and intra-class variation in cross-modal ReID, this unsupervised approach is able to outperform other state-of-the-art methods on standard benchmarks. This is an important step towards more practical and widely applicable ReID systems that do not require expensive manual labeling.

Technical Explanation

The paper first identifies two key challenges in unsupervised visible-infrared ReID: the domain gap between the visible and IR modalities, and the intra-class variation within each modality due to factors like illumination and viewpoint changes.

To address these challenges, the authors propose a two-stage approach:

Pseudo-label Correction: The model starts by generating initial pseudo-labels to group similar images together, using a clustering-based approach. However, these pseudo-labels may not perfectly align with the true underlying structure of the data. To refine them, the model iteratively updates the pseudo-labels based on the current feature representations, gradually correcting any mistakes.
Modality-level Alignment: In parallel, the model aligns the feature representations of the visible and IR modalities, reducing the domain gap between them. This is achieved through a modality-level contrastive loss that encourages features from the same person (across modalities) to be close, while features from different people to be farther apart.

The authors evaluate their approach on several visible-infrared ReID benchmarks, including SYSU-MM01 and RegDB. They show that their unsupervised method outperforms other state-of-the-art unsupervised approaches, including FPL and AICL.

Critical Analysis

The paper provides a solid technical contribution to the field of unsupervised visible-infrared ReID. The proposed pseudo-label correction and modality-level alignment techniques appear to be effective at addressing the key challenges in this problem.

However, the authors do not discuss the computational complexity or training time of their approach, which could be important considerations for real-world deployment. Additionally, the paper does not explore the performance of the method on more diverse datasets or in the presence of occlusions, which are common challenges in ReID.

Further research could also investigate the robustness of the pseudo-label correction mechanism and explore alternative ways to align the visible and IR modalities, such as through cross-modal feature fusion or modality-specific feature learning.

Conclusion

This paper presents an unsupervised approach to visible-infrared person re-identification that addresses the challenges of domain gap and intra-class variation through pseudo-label correction and modality-level alignment. The proposed method outperforms other state-of-the-art unsupervised techniques, demonstrating the effectiveness of these key ideas.

By reducing the need for expensive labeled data, this work represents an important step towards more practical and widely applicable ReID systems. Further research to improve the efficiency and robustness of the approach could lead to even more impactful real-world applications in areas like security and surveillance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤷

Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, Xinbo Gao

Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to match pedestrian images of the same identity from different modalities without annotations. Existing works mainly focus on alleviating the modality gap by aligning instance-level features of the unlabeled samples. However, the relationships between cross-modality clusters are not well explored. To this end, we propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters. Specifically, we design a Many-to-many Bilateral Cross-Modality Cluster Matching (MBCCM) algorithm through optimizing the maximum matching problem in a bipartite graph. Then, the matched pairwise clusters utilize shared visible and infrared pseudo-labels during the model training. Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level. Meanwhile, the cross-modality Consistency Constraint (CC) is proposed to explicitly reduce the large modality discrepancy. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method, surpassing state-of-the-art approaches by a large margin of 8.76% mAP on average.

5/28/2024

cs.CV cs.AI

Learning Commonality, Divergence and Variety for Unsupervised Visible-Infrared Person Re-identification

Jiangming Shi, Xiangbo Yin, Yaoxing Wang, Xiaofeng Liu, Yuan Xie, Yanyun Qu

Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match specified people in infrared images to visible images without annotation, and vice versa. USVI-ReID is a challenging yet under-explored task. Most existing methods address the USVI-ReID problem using cluster-based contrastive learning, which simply employs the cluster center as a representation of a person. However, the cluster center primarily focuses on shared information, overlooking disparity. To address the problem, we propose a Progressive Contrastive Learning with Multi-Prototype (PCLMP) method for USVI-ReID. In brief, we first generate the hard prototype by selecting the sample with the maximum distance from the cluster center. This hard prototype is used in the contrastive loss to emphasize disparity. Additionally, instead of rigidly aligning query images to a specific prototype, we generate the dynamic prototype by randomly picking samples within a cluster. This dynamic prototype is used to retain the natural variety of features while reducing instability in the simultaneous learning of both common and disparate information. Finally, we introduce a progressive learning strategy to gradually shift the model's attention towards hard samples, avoiding cluster deterioration. Extensive experiments conducted on the publicly available SYSU-MM01 and RegDB datasets validate the effectiveness of the proposed method. PCLMP outperforms the existing state-of-the-art method with an average mAP improvement of 3.9%. The source codes will be released.

5/28/2024

cs.CV

🤷

Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification

Xiangbo Yin, Jiangming Shi, Yachao Zhang, Yang Lu, Zhizhong Zhang, Yuan Xie, Yanyun Qu

Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on shielding the model from the harmful effects of noise, neglecting to calibrate noisy pseudo-labels usually associated with hard samples, which will compromise the robustness of the model. To address this issue, we design a Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework for USVI-ReID. To be specific, we first introduce a straightforward yet potent Noisy Pseudo-label Calibration module to correct noisy pseudo-labels. Due to the high intra-class variations, noisy pseudo-labels are difficult to calibrate completely. Therefore, we introduce a Neighbor Relation Learning module to reduce high intra-class variations by modeling potential interactions between all samples. Subsequently, we devise an Optimal Transport Prototype Matching module to establish reliable cross-modality correspondences. On that basis, we design a Memory Hybrid Learning module to jointly learn modality-specific and modality-invariant information. Comprehensive experiments conducted on two widely recognized benchmarks, SYSU-MM01 and RegDB, demonstrate that RPNR outperforms the current state-of-the-art GUR with an average Rank-1 improvement of 10.3%. The source codes will be released soon.

5/10/2024

cs.CV

🏷️

Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning

Zhihao Qian, Yutian Lin, Bo Du

Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate intermediate images by GANs, grayscaling, or mixup strategies. However, these methods could introduce extra data distribution, and the semantic correspondence between the two modalities is not well learned. In this paper, we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of the same person from two modalities are split into patches and stitched into a new one for model learning. A part-alignment loss is introduced to regularize representation learning, and a patch-mixed modality learning loss is proposed to align between the modalities. In this way, the model learns to recognize a person through patches of different styles, thereby the modality semantic correspondence can be inferred. In addition, with the flexible image generation strategy, the patch-mixed images freely adjust the ratio of different modality patches, which could further alleviate the modality imbalance problem. On two VI-ReID datasets, we report new state-of-the-art performance with the proposed method.

5/1/2024

cs.CV