Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

2406.16042

Published 6/26/2024 by In`es Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee and 1 other

cs.CV

Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Abstract

Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data augmentation; however, they rely on human poses already present in the training dataset, failing to effectively reduce the human pose bias in the dataset. We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment a training dataset that enables existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. Using the SMPL model, we simultaneously capture both the desired human poses and camera viewpoints, enabling realistic human rendering. The depth information provided by the SMPL model indirectly conveys the camera viewpoints. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate realistic images with diverse human poses and camera viewpoints. Qualitative results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models.

Create account to get full access

Overview

This paper proposes a novel pose-diversified augmentation technique using a diffusion model to improve person re-identification (re-ID) performance.
The key idea is to generate diverse synthetic pose variations of person images to augment the training data, which can help the model better generalize to unseen poses during inference.
The proposed method leverages a diffusion model to capture the complex distribution of human poses and generate realistic yet diverse synthetic poses for data augmentation.

Plain English Explanation

The paper focuses on the problem of person re-identification (re-ID), which is the task of identifying a person across different camera views or at different times. One of the challenges in re-ID is that people can appear in a wide variety of poses, making it difficult for the model to generalize.

To address this, the researchers developed a new data augmentation technique that uses a diffusion model to generate diverse synthetic poses of person images. A diffusion model is a type of machine learning model that can be used to create new images by starting with random noise and gradually transforming it into something that looks like the training data.

By applying this diffusion-based pose augmentation to the training data, the researchers were able to improve the performance of the re-ID model. The idea is that the model can learn to be more robust to different poses, making it better able to recognize people in real-world scenarios where their poses may vary.

This approach is particularly useful for tasks like surveillance or smart city applications, where being able to accurately identify people across different camera views is important. The use of the diffusion model to generate diverse poses is an innovative solution to the pose variation challenge in person re-identification.

Technical Explanation

The paper proposes a novel pose-diversified augmentation technique using a diffusion model to improve person re-identification (re-ID) performance. The key idea is to generate diverse synthetic pose variations of person images to augment the training data, which can help the model better generalize to unseen poses during inference.

The proposed method leverages a diffusion model, which is a type of generative model that can capture the complex distribution of human poses and generate realistic yet diverse synthetic poses for data augmentation. The diffusion model is trained on a dataset of human pose annotations, and then used to generate new pose variations of the person images in the re-ID training set.

The authors conduct extensive experiments on several popular re-ID benchmarks, including Market-1501, DukeMTMC-reID, and CUHK03. They demonstrate that the proposed pose-diversified augmentation technique can significantly improve the re-ID performance, outperforming other state-of-the-art data augmentation methods, such as MagicPose and High-Fidelity Person-Centric.

Critical Analysis

The paper presents a novel and effective approach to address the challenge of pose variation in person re-identification. The use of a diffusion model for pose-diversified augmentation is a clever and innovative solution, as it can generate realistic yet diverse synthetic poses to enrich the training data.

However, the paper does not provide a detailed analysis of the limitations of the proposed method. For example, it does not discuss the potential impact of the quality of the pose annotations used to train the diffusion model, or the computational cost and training time of the overall approach.

Additionally, the paper could have explored the potential application of the proposed technique to other related tasks, such as human action recognition or 3D human pose estimation, where pose variation is also a significant challenge.

Despite these minor limitations, the paper makes a valuable contribution to the field of person re-identification and sets a strong foundation for future research in this area.

Conclusion

In this paper, the authors propose a novel pose-diversified augmentation technique using a diffusion model to improve person re-identification (re-ID) performance. The key idea is to generate diverse synthetic pose variations of person images to augment the training data, which can help the model better generalize to unseen poses during inference.

The proposed method leverages a diffusion model to capture the complex distribution of human poses and generate realistic yet diverse synthetic poses for data augmentation. The extensive experiments on popular re-ID benchmarks demonstrate the effectiveness of the approach, outperforming other state-of-the-art data augmentation methods.

This innovative use of diffusion models for pose-diversified augmentation can have significant implications for various computer vision tasks beyond person re-identification, where pose variation is a crucial challenge. The paper paves the way for further research in this direction, exploring the potential applications and addressing the limitations of the current approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, Xiangyang Xue

Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority.

6/11/2024

cs.CV cs.AI

Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection

Yi-Ting Shen, Hyungtae Lee, Heesung Kwon, Shuvra S. Bhattacharyya

We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection. Our method firstly constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses while maintaining the original style using an image translator. Since images corresponding to the novel poses are not available in training, the image translator is trained to be applicable only when the input and target poses are similar, thus training does not require the novel poses and their corresponding images. Next, we select a sequence of target novel poses from the novel pose set, using Dijkstra's algorithm to ensure that poses closer to each other are located adjacently in the sequence. Finally, we repeatedly apply the image translator to each target pose in sequence to produce a group of novel pose images representing a variety of different limited body movements from the source pose. Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified synthetic dataset in training generally presents remarkably better accuracy than using the original synthetic dataset on three aerial-view human detection benchmarks (VisDrone, Okutama-Action, and ICG) in the few-shot regime.

5/28/2024

cs.CV

🔄

Diffusion Deepfake

Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu

Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to this evolving threat, our paper introduces two extensive deepfake datasets generated by state-of-the-art diffusion models as other datasets are less diverse and low in quality. Our extensive experiments also showed that our dataset is more challenging compared to the other face deepfake datasets. Our strategic dataset creation not only challenge the deepfake detectors but also sets a new benchmark for more evaluation. Our comprehensive evaluation reveals the struggle of existing detection methods, often optimized for specific image domains and manipulations, to effectively adapt to the intricate nature of diffusion deepfakes, limiting their practical utility. To address this critical issue, we investigate the impact of enhancing training data diversity on representative detection methods. This involves expanding the diversity of both manipulation techniques and image domains. Our findings underscore that increasing training data diversity results in improved generalizability. Moreover, we propose a novel momentum difficulty boosting strategy to tackle the additional challenge posed by training data heterogeneity. This strategy dynamically assigns appropriate sample weights based on learning difficulty, enhancing the model's adaptability to both easy and challenging samples. Extensive experiments on both existing and newly proposed benchmarks demonstrate that our model optimization approach surpasses prior alternatives significantly.

4/3/2024

cs.CV

❗

MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani

In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The code is available at: https://github.com/Boese0601/MagicDance

5/7/2024

cs.CV