Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images

2406.13393

Published 6/26/2024 by Haruo Fujiwara, Yusuke Mukuta, Tatsuya Harada

Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images

Abstract

We propose a simple yet effective pipeline for stylizing a 3D scene, harnessing the power of 2D image diffusion models. Given a NeRF model reconstructed from a set of multi-view images, we perform 3D style transfer by refining the source NeRF model using stylized images generated by a style-aligned image-to-image diffusion model. Given a target style prompt, we first generate perceptually similar multi-view images by leveraging a depth-conditioned diffusion model with an attention-sharing mechanism. Next, based on the stylized multi-view images, we propose to guide the style transfer process with the sliced Wasserstein loss based on the feature maps extracted from a pre-trained CNN model. Our pipeline consists of decoupled steps, allowing users to test various prompt ideas and preview the stylized 3D result before proceeding to the NeRF fine-tuning stage. We demonstrate that our method can transfer diverse artistic styles to real-world 3D scenes with competitive quality. Result videos are also available on our project page: https://haruolabs.github.io/style-n2n/

Create account to get full access

Overview

This paper presents a new method called "Style-NeRF2NeRF" for transferring the style of 2D images onto 3D neural radiance fields (NeRFs)
The key idea is to leverage multi-view images of the same object that have been stylized using different artistic techniques
By aligning these style-varied 2D views, the method can then transfer the styles onto a 3D NeRF representation of the object

Plain English Explanation

The Style-NeRF2NeRF method allows you to take 2D images that have been creatively "styled" using different artistic techniques, and then transfer those styles onto a 3D digital model of the same object.

The process works by first capturing multiple camera views of the 3D object. These multi-view images are then stylized in different ways, for example making the object look like a painting, cartoon, or sketch. The key insight is that even though the 2D views have different styles applied, they are all depicting the same underlying 3D object.

By analyzing the alignment between these style-varied 2D views, the Style-NeRF2NeRF method can then transfer the diverse artistic styles onto a 3D neural radiance field (NeRF) representation of the object. This allows you to create 3D models that exhibit a range of creative visual styles, rather than being limited to a single realistic appearance.

The potential applications include creating stylized 3D assets for virtual worlds, video games, or augmented reality experiences that blend seamlessly with the surrounding environment. It could also enable new forms of 3D-aware digital art and allow for more expressive 3D content creation.

Technical Explanation

The Style-NeRF2NeRF method builds upon prior work on neural radiance fields (NeRFs) and style transfer to enable 3D style transfer from multi-view images.

The key innovation is a new neural network architecture that takes as input a set of 2D images depicting the same 3D object, where each image has a different artistic style applied. The network learns to align these style-varied 2D views and then transfer the diverse styles onto a 3D NeRF representation of the object.

Technically, the method works by first using a style encoder network to extract style features from the 2D stylized images. These style features are then fused with a 3D shape encoder that extracts geometric information from the multi-view images.

The fused style and shape features are used to condition the parameters of a NeRF network, which can then render the 3D object with the transferred artistic styles. The training process encourages the network to preserve the object's 3D structure while faithfully reproducing the diverse 2D styles.

Experiments demonstrate that Style-NeRF2NeRF can transfer a wide range of styles, from photorealistic to highly abstract, onto 3D models in a visually coherent manner.

Critical Analysis

The Style-NeRF2NeRF method represents an exciting advance in the field of 3D content creation and stylization. By leveraging multi-view images, it overcomes limitations of prior 3D style transfer techniques that could only work with a single 2D input.

However, the method does have some potential limitations. It requires capturing multiple views of the same object, which can be challenging or impractical in some real-world scenarios. The training process is also computationally intensive, as it involves optimizing both the 3D shape and 2D style representations.

Additionally, the quality and coherence of the stylized 3D outputs may be sensitive to factors like the diversity and alignment of the 2D input views, as well as the specific artistic styles being transferred. Further research may be needed to improve the robustness and generalization of the technique.

Despite these caveats, Style-NeRF2NeRF represents an important step forward in bridging the gap between 2D and 3D creative expression. It opens up new possibilities for more expressive and visually engaging 3D content, with potential applications in areas like virtual reality, gaming, and digital art.

Conclusion

The Style-NeRF2NeRF method presented in this paper demonstrates a novel approach for transferring artistic styles from 2D images onto 3D neural radiance field (NeRF) representations.

By leveraging multi-view images of the same object, each with a different style applied, the method is able to learn how to faithfully reproduce those diverse styles in a 3D context. This represents an important advancement in the field of 3D content creation and stylization, with potential applications in virtual worlds, video games, and digital art.

While the technique has some limitations, it opens up new avenues for more expressive and visually engaging 3D experiences that seamlessly blend realism and creative expression. As 3D technologies continue to evolve, methods like Style-NeRF2NeRF will play a crucial role in empowering artists, designers, and creators to push the boundaries of what is possible in the digital realm.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Multi-style Neural Radiance Field with AdaIN

Yu-Wen Pao, An-Jie Li

In this work, we propose a novel pipeline that combines AdaIN and NeRF for the task of stylized Novel View Synthesis. Compared to previous works, we make the following contributions: 1) We simplify the pipeline. 2) We extend the capabilities of model to handle the multi-style task. 3) We modify the model architecture to perform well on styles with strong brush strokes. 4) We implement style interpolation on the multi-style model, allowing us to control the style between any two styles and the style intensity between the stylized output and the original scene, providing better control over the stylization strength.

6/10/2024

cs.CV cs.GR

ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis

Zichen Tang, Hongyu Yang

Recent advances in generative visual models and neural radiance fields have greatly boosted 3D-aware image synthesis and stylization tasks. However, previous NeRF-based work is limited to single scene stylization, training a model to generate 3D-aware cartoon faces with arbitrary styles remains unsolved. We propose ArtNeRF, a novel face stylization framework derived from 3D-aware GAN to tackle this problem. In this framework, we utilize an expressive generator to synthesize stylized faces and a triple-branch discriminator module to improve the visual quality and style consistency of the generated faces. Specifically, a style encoder based on contrastive learning is leveraged to extract robust low-dimensional embeddings of style images, empowering the generator with the knowledge of various styles. To smooth the training process of cross-domain transfer learning, we propose an adaptive style blending module which helps inject style information and allows users to freely tune the level of stylization. We further introduce a neural rendering module to achieve efficient real-time rendering of images with higher resolutions. Extensive experiments demonstrate that ArtNeRF is versatile in generating high-quality 3D-aware cartoon faces with arbitrary styles.

4/29/2024

cs.CV

Stylizing Sparse-View 3D Scenes with Hierarchical Neural Representation

Y. Wang, A. Gao, Y. Gong, Y. Zeng

Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency.

4/9/2024

cs.CV cs.GR

Dream-in-Style: Text-to-3D Generation using Stylized Score Distillation

Hubert Kompanowski, Binh-Son Hua

We present a method to generate 3D objects in styles. Our method takes a text prompt and a style reference image as input and reconstructs a neural radiance field to synthesize a 3D model with the content aligning with the text prompt and the style following the reference image. To simultaneously generate the 3D object and perform style transfer in one go, we propose a stylized score distillation loss to guide a text-to-3D optimization process to output visually plausible geometry and appearance. Our stylized score distillation is based on a combination of an original pretrained text-to-image model and its modified sibling with the key and value features of self-attention layers manipulated to inject styles from the reference image. Comparisons with state-of-the-art methods demonstrated the strong visual performance of our method, further supported by the quantitative results from our user study.

6/28/2024

cs.CV cs.GR