Semi-supervised reference-based sketch extraction using a contrastive learning framework

Read original: arXiv:2407.14026 - Published 7/22/2024 by Chang Wook Seo, Amirsaman Ashtari, Junyong Noh

⛏️

Overview

Sketches reflect the unique drawing styles of individual artists.
Most existing sketch extraction methods only work for a single style.
Prior attempts to generate multi-style sketches have limitations in quality and training.
This paper proposes a novel multi-modal sketch extraction method that can imitate the style of a given reference sketch using unpaired data training.
The method outperforms state-of-the-art approaches in both quantitative and qualitative evaluations.

Plain English Explanation

The way an artist sketches can reveal their unique drawing style. Most existing sketch extraction methods are only designed to work with a single style of sketches. Although there have been some attempts to generate sketches in different styles, the results are often low quality and the training process is difficult because it requires a paired dataset.

This research paper proposes a new multi-modal sketch extraction method that can imitate the style of a reference sketch, even when the training data is not directly paired. The method is trained in a semi-supervised way, meaning it learns from both labeled and unlabeled data.

Importantly, this new method produces higher quality results than existing state-of-the-art sketch extraction techniques and unpaired image translation approaches. This is according to both quantitative measurements and visual evaluations.

Technical Explanation

The key insight behind this work is that by leveraging a reference sketch, the model can learn to extract sketches that match the style of that reference, even without having perfectly paired training data.

The proposed method uses a multi-modal architecture that takes in both the input color image and the reference sketch. It then learns to generate a new sketch that imitates the style of the reference, in a semi-supervised manner using both labeled and unlabeled data.

The authors conduct extensive experiments to evaluate their method against state-of-the-art sketch extraction and unpaired image translation techniques. The results show significant improvements in both quantitative metrics and visual quality.

Critical Analysis

The paper makes a strong case for the proposed multi-modal sketch extraction method, demonstrating its advantages over prior approaches. However, a few potential limitations are worth noting:

The method still requires a reference sketch, so it may not be fully generalizable to cases where no such reference is available.
The authors do not explore the limits of the technique - i.e., how different can the reference sketch be before the method starts to break down?
The semi-supervised training process is complex, and the paper does not provide much insight into the specific challenges or trade-offs involved.

Nevertheless, the core idea of leveraging a reference sketch to guide the extraction of a new sketch in a matching style is novel and compelling. Further research could explore ways to make the technique more flexible and autonomous.

Conclusion

This paper presents a novel multi-modal sketch extraction method that can generate sketches imitating the style of a given reference, using semi-supervised training on unpaired data. The approach outperforms existing state-of-the-art techniques, both quantitatively and qualitatively.

While the method has some limitations, it represents a significant advance in the field of sketch extraction and could enable a wide range of applications that require stylistically consistent sketches, such as sketch-based 3D modeling or sketch-guided scene generation. The semi-supervised training approach is also a promising direction for improving the efficiency and flexibility of AI models in general.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⛏️

Semi-supervised reference-based sketch extraction using a contrastive learning framework

Chang Wook Seo, Amirsaman Ashtari, Junyong Noh

Sketches reflect the drawing style of individual artists; therefore, it is important to consider their unique styles when extracting sketches from color images for various applications. Unfortunately, most existing sketch extraction methods are designed to extract sketches of a single style. Although there have been some attempts to generate various style sketches, the methods generally suffer from two limitations: low quality results and difficulty in training the model due to the requirement of a paired dataset. In this paper, we propose a novel multi-modal sketch extraction method that can imitate the style of a given reference sketch with unpaired data training in a semi-supervised manner. Our method outperforms state-of-the-art sketch extraction methods and unpaired image translation methods in both quantitative and qualitative evaluations.

7/22/2024

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding

Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.

4/9/2024

SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation

Zhenbei Wu, Qiang Wang, Jie Yang

The scarcity of free-hand sketch presents a challenging problem. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multi-modal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent text-sketch-image triplets. Our research confirms that this dataset can significantly enhance the capabilities of existing models in sketch-based image retrieval and sketch-controlled image synthesis tasks. We will make our dataset and code publicly available.

5/30/2024

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

Dingkun Yan, Liang Yuan, Erwin Wu, Yuma Nishioka, Issei Fujishiro, Suguru Saito

Diffusion models have recently demonstrated their effectiveness in generating extremely high-quality images and are now utilized in a wide range of applications, including automatic sketch colorization. Although many methods have been developed for guided sketch colorization, there has been limited exploration of the potential conflicts between image prompts and sketch inputs, which can lead to severe deterioration in the results. Therefore, this paper exhaustively investigates reference-based sketch colorization models that aim to colorize sketch images using reference color images. We specifically investigate two critical aspects of reference-based diffusion models: the distribution problem, which is a major shortcoming compared to text-based counterparts, and the capability in zero-shot sequential text-based manipulation. We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder and propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs. We conduct comprehensive evaluations of our models through qualitative and quantitative experiments as well as a user study.

7/4/2024