Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization

Read original: arXiv:2312.10111 - Published 7/10/2024 by Yige Chen, Teng Hu, Yizhe Tang, Siyuan Chen, Ang Chen, Ran Yi

Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization

Overview

This paper introduces "Plasticine3D", a new framework for non-rigid 3D object editing guided by text prompts.
The system allows users to manipulate the shape and appearance of 3D objects in an intuitive way using natural language instructions.
This builds on recent advances in text-guided image editing and text-guided 3D editing techniques.

Plain English Explanation

Plasticine3D is a new tool that lets you change the shape and look of 3D objects just by typing what you want. For example, you could type "make the chair taller and the legs thinner" and the system would automatically update the 3D chair model to match your instructions. This builds on recent advances in being able to edit 2D images and 3D scenes using text prompts, like GSEdit and MATE3D. The key idea is to make 3D modeling and editing much more intuitive and accessible for non-expert users by allowing them to use natural language to guide the changes, rather than having to learn complex 3D software tools.

Technical Explanation

The Plasticine3D framework uses a neural network that takes in a 3D mesh of an object and a text prompt describing the desired changes, and outputs an updated 3D mesh that matches the text instructions. The network is trained on a large dataset of 3D objects and associated text captions describing their properties and shapes.

During inference, the text prompt is encoded using a language model, and this is combined with the initial 3D mesh using a series of neural network layers. This allows the model to understand the semantic meaning of the text and how it maps to specific geometric transformations of the 3D shape. The output is a new 3D mesh that has been non-rigidly deformed to match the text prompt.

The authors evaluate Plasticine3D on a variety of 3D object editing tasks, showing that it can handle complex shape changes guided by free-form text much more effectively than prior approaches like Chat-Edit 3D and InstructHumans. The system is also able to generate plausible edited 3D meshes even for prompts that go beyond the training data.

Critical Analysis

The Plasticine3D framework represents an impressive advance in text-guided 3D editing, showing how natural language can be effectively leveraged to make 3D modeling more accessible. However, the paper does note some limitations:

The system is currently focused on simple non-rigid deformations, and may struggle with more complex topology changes or higher-level shape manipulation.
The training data is limited to a specific set of 3D object categories, so the model may not generalize well to completely novel object types or domains.
There are open questions around how to best interface the text-guided editing with other 3D interaction modalities, such as sketching or direct manipulation.

Further research could explore ways to address these limitations, such as by incorporating techniques from audio-based 3D editing with text prompts or developing more flexible neural network architectures. Overall, Plasticine3D represents an exciting step towards more natural and intuitive 3D content creation tools.

Conclusion

The Plasticine3D framework introduces a novel approach for 3D object editing driven by natural language instructions. By leveraging advances in text-to-image and text-to-3D translation, the system allows users to easily manipulate the shape and appearance of 3D models in an intuitive way. This has the potential to democratize 3D content creation, making it accessible to a much broader audience beyond expert 3D modelers. While the current system has some limitations, the core ideas represent an important step forward in combining language and 3D geometry in service of more natural and expressive 3D editing tools.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization

Yige Chen, Teng Hu, Yizhe Tang, Siyuan Chen, Ang Chen, Ran Yi

With the help of Score Distillation Sampling (SDS) and the rapid development of neural 3D representations, some methods have been proposed to perform 3D editing such as adding additional geometries, or overwriting textures. However, generalized 3D non-rigid editing task, which requires changing both the structure (posture or composition) and appearance (texture) of the original object, remains to be challenging in 3D editing field. In this paper, we propose Plasticine3D, a novel text-guided fine-grained controlled 3D editing pipeline that can perform 3D non-rigid editing with large structure deformations. Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance. In order to maintain the details of the original object from different viewpoints, we propose a Multi-View-Embedding (MVE) Optimization strategy to ensure that the guidance model learns the features of the original object from various viewpoints. For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space, and control the extent of editing by adjusting the fusion rate. Furthermore, in order to address the issue of gradual loss of details during the generation process under high editing intensity, as well as the problem of insignificant editing effects in some scenarios, we propose Score Projection Sampling (SPS) as a replacement of score distillation sampling, which introduces additional optimization phases for editing target enhancement and original detail maintenance, leading to better editing quality. Extensive experiments demonstrate the effectiveness of our method on 3D non-rigid editing tasks

7/10/2024

GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting

Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodol`a

We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail.

5/22/2024

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing

Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang, Ming-Ming Cheng

3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this framework, first, we introduce a new SDF-based 3D generator which learns local and global representations with proposed SDF and density consistency losses. This enhances masked-based editing in local areas; second, we present a novel distillation strategy: Conditional Distillation on Geometry and Texture (CDGT). Compared to exiting distillation strategies, it mitigates visual ambiguity and avoids mismatch between texture and geometry, thereby producing stable texture and convincing geometry while editing. Additionally, we create the CatMask-HQ dataset, a large-scale high-resolution cat face annotation for exploration of model generalization and expansion. We perform expensive experiments on both the FFHQ and CatMask-HQ datasets to demonstrate the editing quality and stability of the proposed method. Our method faithfully generates a 3D-aware edited face image based on a modified mask and a text prompt. Our code and models will be publicly released.

7/8/2024

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

Recent work on image content manipulation based on vision-language pre-training models has been effectively extended to text-driven 3D scene editing. However, existing schemes for 3D scene editing still exhibit certain shortcomings, hindering their further interactive design. Such schemes typically adhere to fixed input patterns, limiting users' flexibility in text input. Moreover, their editing capabilities are constrained by a single or a few 2D visual models and require intricate pipeline design to integrate these models into 3D reconstruction processes. To address the aforementioned issues, we propose a dialogue-based 3D scene editing approach, termed CE3D, which is centered around a large language model that allows for arbitrary textual input from users and interprets their intentions, subsequently facilitating the autonomous invocation of the corresponding visual expert models. Furthermore, we design a scheme utilizing Hash-Atlas to represent 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images. This design achieves complete decoupling between the 2D editing and 3D reconstruction processes, enabling CE3D to flexibly integrate a wide range of existing 2D or 3D visual models without necessitating intricate fusion designs. Experimental results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects, possessing strong scene comprehension and multi-round dialog capabilities. The code is available at https://sk-fun.fun/CE3D.

7/11/2024