Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

Read original: arXiv:2408.12400 - Published 8/23/2024 by Bowen Sun, Guo Lu, Shibao Zheng
Total Score

0

Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a method for generating multi-style facial sketches using masked generative modeling.
  • The approach can produce diverse sketch outputs that match different artistic styles, while preserving key facial features.
  • The model is trained on a large dataset of face images and corresponding sketches in various styles.

Plain English Explanation

The researchers developed a system that can generate realistic sketches of human faces in different artistic styles. This could be useful for applications like character design, photo editing, and digital art.

The key idea is to use a masked generative model, which means the model learns to fill in missing parts of the sketch based on the surrounding information. This allows the system to produce diverse sketch outputs that capture different styles, while still maintaining the essential facial features.

The model is trained on a large dataset of face photos and their corresponding sketches in various styles. This teaches the system the relationship between real faces and their stylized sketch representations. At inference time, the model can then generate new sketches that match a desired artistic style.

Technical Explanation

The paper introduces a Masked Generative Adversarial Network (MaskGAN) architecture for multi-style facial sketch synthesis. The generator network takes in a face image and a style code, and outputs a corresponding sketch in that style.

The key innovation is the use of masked convolutions in the generator. This allows the model to focus on filling in missing regions of the sketch, based on the surrounding facial features, rather than generating the entire sketch from scratch. The discriminator network is trained to distinguish real sketches from generated ones.

The authors evaluate their approach on several facial sketch datasets, demonstrating its ability to produce diverse, style-specific sketches that preserve important facial characteristics. Compared to prior work, their method achieves higher fidelity and greater style variation in the generated outputs.

Critical Analysis

One potential limitation of the proposed approach is its reliance on a large, curated dataset of face images and corresponding sketches. Building such a dataset can be labor-intensive and may not capture the full diversity of real-world facial features and artistic styles.

Additionally, while the masked generative modeling technique allows for flexible style transfer, it may struggle to faithfully reproduce highly complex or abstract artistic styles. Further research could explore ways to incorporate more advanced sketch rendering techniques or to leverage additional modalities, such as text descriptions, to enhance the style diversity and realism of the generated sketches.

Conclusion

This paper presents a novel approach for generating multi-style facial sketches using masked generative modeling. The system can produce diverse sketch outputs that capture different artistic styles while preserving key facial features. The technical innovations and promising results suggest that this work could have a significant impact on applications such as digital art, character design, and photo editing.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Multi-Style Facial Sketch Synthesis through Masked Generative Modeling
Total Score

0

Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

Bowen Sun, Guo Lu, Shibao Zheng

The facial sketch synthesis (FSS) model, capable of generating sketch portraits from given facial photographs, holds profound implications across multiple domains, encompassing cross-modal face recognition, entertainment, art, media, among others. However, the production of high-quality sketches remains a formidable task, primarily due to the challenges and flaws associated with three key factors: (1) the scarcity of artist-drawn data, (2) the constraints imposed by limited style types, and (3) the deficiencies of processing input information in existing models. To address these difficulties, we propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches, obviating the necessity for any supplementary inputs (eg, 3D geometry). In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process. Additionally, we employ a feature extraction module and style embeddings to proficiently steer the generative transformer during the iterative prediction of masked image tokens, thus achieving a continuous stylized output that retains facial features accurately in sketches. The extensive experiments demonstrate that our method consistently outperforms previous algorithms across multiple benchmarks, exhibiting a discernible disparity.

Read more

8/23/2024

PS-StyleGAN: Illustrative Portrait Sketching using Attention-Based Style Adaptation
Total Score

0

PS-StyleGAN: Illustrative Portrait Sketching using Attention-Based Style Adaptation

Kushal Kumar Jain, Ankith Varun J, Anoop Namboodiri

Portrait sketching involves capturing identity specific attributes of a real face with abstract lines and shades. Unlike photo-realistic images, a good portrait sketch generation method needs selective attention to detail, making the problem challenging. This paper introduces textbf{Portrait Sketching StyleGAN (PS-StyleGAN)}, a style transfer approach tailored for portrait sketch synthesis. We leverage the semantic $W+$ latent space of StyleGAN to generate portrait sketches, allowing us to make meaningful edits, like pose and expression alterations, without compromising identity. To achieve this, we propose the use of Attentive Affine transform blocks in our architecture, and a training strategy that allows us to change StyleGAN's output without finetuning it. These blocks learn to modify style latent code by paying attention to both content and style latent features, allowing us to adapt the outputs of StyleGAN in an inversion-consistent manner. Our approach uses only a few paired examples ($sim 100$) to model a style and has a short training time. We demonstrate PS-StyleGAN's superiority over the current state-of-the-art methods on various datasets, qualitatively and quantitatively.

Read more

9/4/2024

🖼️

Total Score

0

MagicFace: Training-free Universal-Style Human Image Customized Synthesis

Yibin Wang, Weizhong Zhang, Cheng Jin

Current state-of-the-art methods for human image customized synthesis typically require tedious training on large-scale datasets. In such cases, they are prone to overfitting and struggle to personalize individuals of unseen styles. Moreover, these methods extensively focus on single-concept human image synthesis and lack the flexibility needed for customizing individuals with multiple given concepts, thereby impeding their broader practical application. To this end, we propose MagicFace, a novel training-free method for universal-style human image personalized synthesis, enabling multi-concept customization by accurately integrating reference concept features into their latent generated region at the pixel level. Specifically, MagicFace introduces a coarse-to-fine generation pipeline, involving two sequential stages: semantic layout construction and concept feature injection. This is achieved by our Reference-aware Self-Attention (RSA) and Region-grouped Blend Attention (RBA) mechanisms. In the first stage, RSA enables the latent image to query features from all reference concepts simultaneously, extracting the overall semantic understanding to facilitate the initial semantic layout establishment. In the second stage, we employ an attention-based semantic segmentation method to pinpoint the latent generated regions of all concepts at each step. Following this, RBA divides the pixels of the latent image into semantic groups, with each group querying fine-grained features from the corresponding reference concept, which ensures precise attribute alignment and feature injection. Throughout the generation process, a weighted mask strategy is employed to ensure the model focuses more on the reference concepts. Extensive experiments demonstrate the superiority of MagicFace in both human-centric subject-to-image synthesis and multi-concept human image customization.

Read more

8/20/2024

Sketch-Guided Scene Image Generation
Total Score

0

Sketch-Guided Scene Image Generation

Tianyu Zhang, Xiaoxuan Xie, Xusheng Du, Haoran Xie

Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image scene generation from sketch inputs into object-level cross-domain generation and scene-level image construction. We employ pre-trained diffusion models to convert each single object drawing into an image of the object, inferring additional details while maintaining the sparse sketch structure. In order to maintain the conceptual fidelity of the foreground during scene generation, we invert the visual features of object images into identity embeddings for scene generation. In scene-level image construction, we generate the latent representation of the scene image using the separated background prompts, and then blend the generated foreground objects according to the layout of the sketch input. To ensure the foreground objects' details remain unchanged while naturally composing the scene image, we infer the scene image on the blended latent representation using a global prompt that includes the trained identity tokens. Through qualitative and quantitative experiments, we demonstrate the ability of the proposed approach to generate scene images from hand-drawn sketches surpasses the state-of-the-art approaches.

Read more

7/10/2024