Human Image Generation: A Comprehensive Survey

2212.08896

Published 5/27/2024 by Zhen Jia, Zhang Zhang, Liang Wang, Tieniu Tan

🖼️

Abstract

Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each paradigm, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures. Besides, the main public human image datasets and evaluation metrics in the literature are summarized. Furthermore, due to the wide application potentials, the typical downstream usages of synthesized human images are covered. Finally, the challenges and potential opportunities of human image generation are discussed to shed light on future research.

Create account to get full access

Overview

This paper provides a comprehensive overview of methods for generating high-fidelity human images using deep learning models.
The authors categorize these techniques into three main paradigms: data-driven methods, knowledge-guided methods, and hybrid methods.
The paper also covers the main human image datasets, evaluation metrics, and typical downstream applications of the generated images.
Finally, the paper discusses the challenges and potential future research directions in this field.

Plain English Explanation

Generating realistic human images has become an important area of research in computer vision and machine learning. Many researchers have been working on developing techniques to create high-quality human images, which have a wide range of applications.

The authors of this paper have organized the different approaches into three main categories:

Data-driven methods: These techniques rely on large datasets of human images to train deep learning models that can generate new, realistic-looking images.
Knowledge-guided methods: These methods incorporate additional information, such as 3D human models or anatomical knowledge, to guide the image generation process and improve the realism of the results.
Hybrid methods: These approaches combine elements of both data-driven and knowledge-guided techniques to leverage the strengths of each.

The paper also provides an overview of the publicly available datasets and evaluation metrics used to assess the quality of the generated human images. Additionally, it covers some of the common applications of these synthetic images, such as image-based virtual try-on and transfer learning for other computer vision tasks.

Finally, the authors discuss the remaining challenges in this field and suggest potential directions for future research, such as improving the realism and diversity of the generated images and developing techniques for 3D human reconstruction from in-the-wild images.

Technical Explanation

The paper begins by introducing the importance of human image generation in computer vision and machine learning, and the need for a comprehensive overview of the various techniques in this area.

The authors then present the three main paradigms for human image generation:

Data-driven methods: These techniques rely on large datasets of human images to train deep learning models, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), to generate new, realistic-looking images. The models learn the underlying distributions of the training data and can then sample from these distributions to create new images.
Knowledge-guided methods: These approaches incorporate additional information, such as 3D human models or anatomical knowledge, to guide the image generation process and improve the realism of the results. For example, some methods use 3D human reconstruction techniques to generate accurate 3D representations of human bodies, which can then be used to synthesize 2D images.
Hybrid methods: These techniques combine elements of both data-driven and knowledge-guided approaches to leverage the strengths of each. For instance, some methods use deep learning models trained on large datasets but also incorporate 3D human priors to improve the realism and diversity of the generated images.

The paper also provides an overview of the main public datasets used for human image generation, such as COCO, DeepFashion, and VoxCeleb, as well as the commonly used evaluation metrics, such as Inception Score and Fréchet Inception Distance.

Furthermore, the authors discuss the wide range of applications for the synthesized human images, including image-based virtual try-on, transfer learning for other computer vision tasks, and deepfake generation and detection.

Critical Analysis

The paper provides a comprehensive and well-structured overview of the various techniques for human image generation, which is a valuable contribution to the field. The authors have done a commendable job of categorizing the methods into three distinct paradigms and highlighting the key characteristics and strengths of each approach.

However, the paper could have delved deeper into the limitations and potential issues with some of the presented techniques. For example, the authors could have discussed the challenges in ensuring the diversity and realism of the generated images, or the ethical considerations around the use of deepfake technology.

Additionally, the paper could have provided more critical analysis of the current state of the field and the potential risks or drawbacks associated with the widespread use of synthetic human images, such as the potential for misuse or abuse.

Overall, the paper is a valuable resource for researchers and practitioners working in the field of human image generation, but it could have benefited from a more comprehensive discussion of the limitations and potential pitfalls of the presented techniques.

Conclusion

This paper provides a comprehensive overview of the various techniques for generating high-fidelity human images using deep learning models. The authors have categorized the methods into three main paradigms: data-driven, knowledge-guided, and hybrid approaches. The paper also covers the available datasets, evaluation metrics, and typical downstream applications of the synthesized images.

While the paper offers a thorough technical explanation of the different techniques, it could have delved deeper into the limitations and potential issues associated with these methods. Nonetheless, this work serves as a valuable resource for researchers and practitioners working in the rapidly evolving field of human image generation, and it highlights the exciting potential of these technologies, as well as the need for continued research and careful consideration of their ethical implications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong

3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of work on creating 3D human avatars has been introduced, forming a new and abundant knowledge base for 3D human modeling. The scale of the literature makes it difficult for individuals to keep track of all the works. This survey aims to provide a comprehensive overview of these emerging techniques for 3D human avatar modeling, from both reconstruction and generation perspectives. Firstly, we review representative methods for 3D human reconstruction, including methods based on pixel-aligned implicit function, neural radiance field, and 3D Gaussian Splatting, etc. We then summarize representative methods for 3D human generation, especially those using large language models like CLIP, diffusion models, and various 3D representations, which demonstrate state-of-the-art performance. Finally, we discuss our reflection on existing methods and open challenges for 3D human avatar modeling, shedding light on future research.

6/7/2024

cs.CV

3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models

Yongtao Ge, Wenjia Wang, Yongfan Chen, Hao Chen, Chunhua Shen

In this work, we show that synthetic data created by generative models is complementary to computer graphics (CG) rendered data for achieving remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation (HPS). Specifically, we propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. We first collect a large-scale human-centric dataset with comprehensive annotations, e.g., text captions and surface normal images. Then, we train a customized ControlNet model upon this dataset to generate diverse human images and initial ground-truth labels. At the core of this step is that we can easily obtain numerous surface normal images from a 3D human parametric model, e.g., SMPL-X, by rendering the 3D mesh onto the image plane. As there exists inevitable noise in the initial labels, we then apply an off-the-shelf foundation segmentation model, i.e., SAM, to filter negative data samples. Our data generation pipeline is flexible and customizable to facilitate different real-world tasks, e.g., ego-centric scenes and perspective-distortion scenes. The generated dataset comprises 0.79M images with corresponding 3D annotations, covering versatile viewpoints, scenes, and human identities. We train various HPS regressors on top of the generated data and evaluate them on a wide range of benchmarks (3DPW, RICH, EgoBody, AGORA, SSP-3D) to verify the effectiveness of the generated data. By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.

4/12/2024

cs.CV

🤖

New!Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges

Mahmoud Ibrahim, Yasmina Al Khalil, Sina Amirrajab, Chang Suna, Marcel Breeuwer, Josien Pluim, Bart Elen, Gokhan Ertaylan, Michel Dumontiera

This paper presents a comprehensive systematic review of generative models (GANs, VAEs, DMs, and LLMs) used to synthesize various medical data types, including imaging (dermoscopic, mammographic, ultrasound, CT, MRI, and X-ray), text, time-series, and tabular data (EHR). Unlike previous narrowly focused reviews, our study encompasses a broad array of medical data modalities and explores various generative models. Our search strategy queries databases such as Scopus, PubMed, and ArXiv, focusing on recent works from January 2021 to November 2023, excluding reviews and perspectives. This period emphasizes recent advancements beyond GANs, which have been extensively covered previously. The survey reveals insights from three key aspects: (1) Synthesis applications and purpose of synthesis, (2) generation techniques, and (3) evaluation methods. It highlights clinically valid synthesis applications, demonstrating the potential of synthetic data to tackle diverse clinical requirements. While conditional models incorporating class labels, segmentation masks and image translations are prevalent, there is a gap in utilizing prior clinical knowledge and patient-specific context, suggesting a need for more personalized synthesis approaches and emphasizing the importance of tailoring generative approaches to the unique characteristics of medical data. Additionally, there is a significant gap in using synthetic data beyond augmentation, such as for validation and evaluation of downstream medical AI models. The survey uncovers that the lack of standardized evaluation methodologies tailored to medical images is a barrier to clinical application, underscoring the need for in-depth evaluation approaches, benchmarking, and comparative studies to promote openness and collaboration.

7/2/2024

cs.LG cs.AI

Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges

Daniel A. P. Oliveira, Eug'enio Ribeiro, David Martins de Matos

Creating engaging narratives from visual data is crucial for automated digital media consumption, assistive technologies, and interactive entertainment. This survey covers methodologies used in the generation of these narratives, focusing on their principles, strengths, and limitations. The survey also covers tasks related to automatic story generation, such as image and video captioning, and visual question answering, as well as story generation without visual inputs. These tasks share common challenges with visual story generation and have served as inspiration for the techniques used in the field. We analyze the main datasets and evaluation metrics, providing a critical perspective on their limitations.

6/6/2024

cs.CV cs.AI