Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

2405.08054

Published 5/15/2024 by Wenqi Dong, Bangbang Yang, Lin Ma, Xiao Liu, Liyuan Cui, Hujun Bao, Yuewen Ma, Zhaopeng Cui

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Abstract

As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tasks are still unavailable due to the lack of controllability and efficiency in 3D generation. In this paper, we present a novel controllable and interactive 3D assets modeling framework, named Coin3D. Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes, and introduces an interactive generation workflow to support seamless local part editing while delivering responsive 3D object previewing within a few seconds. To this end, we develop several techniques, including the 3D adapter that applies volumetric coarse shape control to the diffusion model, proxy-bounded editing strategy for precise part editing, progressive volume cache to support responsive preview, and volume-SDS to ensure consistent mesh reconstruction. Extensive experiments of interactive generation and editing on diverse shape proxies demonstrate that our method achieves superior controllability and flexibility in the 3D assets generation task.

Create account to get full access

Overview

This paper introduces Coin3D, a novel AI-based system for generating and interactively controlling 3D assets.
Coin3D allows users to create 3D models by guiding the generative process with coarse shape proxies, enabling controllable and interactive 3D asset generation.
The system leverages a proxy-guided conditioning approach, which helps users achieve their desired 3D shapes more efficiently compared to traditional 3D modeling workflows.

Plain English Explanation

Coin3D is a new AI-powered tool that makes it easier for people to create and customize 3D models. Rather than starting from scratch, users can guide the AI by providing a rough "proxy" shape, and the AI will then generate a more detailed 3D model based on that input. This allows users to quickly iterate and fine-tune the 3D model to achieve their desired result, without needing advanced 3D modeling skills. The key innovation in Coin3D is this proxy-guided conditioning approach, which helps bridge the gap between a user's initial concept and the final 3D asset. This makes the 3D creation process more intuitive and accessible, opening up opportunities for a wider range of creators to bring their ideas to life in 3D.

Technical Explanation

The paper introduces the Coin3D system, which leverages a proxy-guided conditioning approach for controllable and interactive 3D asset generation. The system takes as input a coarse shape proxy (e.g., a simple 3D shape or 2D sketch) and generates a more detailed 3D model that aligns with the provided guidance.

The key technical components of Coin3D include:

Proxy Encoder: A neural network that encodes the input proxy shape into a latent representation.
Generative Model: A conditional generative model that takes the proxy latent code and generates the final 3D asset.
Interactive Controls: Mechanisms that allow users to further refine and customize the generated 3D model through interactive editing tools.

The proxy-guided conditioning approach helps users achieve their desired 3D shapes more efficiently compared to traditional 3D modeling workflows, which often require extensive manual effort and technical expertise. By bridging the gap between the initial concept and the final 3D asset, Coin3D democratizes the 3D creation process and enables a wider range of users to bring their ideas to life.

Critical Analysis

The paper presents a promising approach for improving the accessibility and controllability of 3D asset generation. However, some potential limitations and areas for further research are worth considering:

Proxy Representation: The paper focuses on coarse shape proxies, but exploring the use of other proxy representations, such as FARM3D or COCOG, could further expand the range of user inputs and enable even more expressive 3D generation.
Interactive Editing: While the paper discusses interactive controls, the specific mechanisms and their usability could be evaluated more thoroughly, especially in comparison to existing 3D modeling tools like interactive3D.
Scalability and Performance: As the complexity of generated 3D assets increases, the system's scalability and real-time performance should be assessed, particularly for use cases that require rapid iteration and feedback, such as IDEA 2:3D.
Generalization and Diversity: The paper's evaluation focuses on a limited set of 3D categories. Exploring the system's ability to generate diverse 3D assets across a wider range of domains, as well as its generalization capabilities, would be valuable.

Overall, the Coin3D system presents an intriguing approach to democratizing 3D asset creation, and the proposed proxy-guided conditioning technique holds promise for making 3D modeling more accessible and interactive for a broader audience.

Conclusion

The Coin3D system introduced in this paper represents a significant step towards more accessible and controllable 3D asset generation. By leveraging a proxy-guided conditioning approach, the system allows users to create and customize 3D models with greater ease and efficiency compared to traditional 3D modeling workflows. This innovation has the potential to empower a wider range of creators to bring their ideas to life in the digital 3D space, fostering increased creativity and experimentation. As the field of AI-driven 3D modeling continues to evolve, research like Coin3D demonstrates the value of making these powerful tools more intuitive and user-friendly for both professionals and hobbyists alike.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Interactive3D: Create What You Want by Interactive 3D Generation

Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction capabilities. Existing methods mainly offer two approaches: (i) interpreting textual instructions with constrained controllability, or (ii) reconstructing 3D objects from 2D images. Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications. In this work, we introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process through extensive 3D interaction capabilities. Interactive3D is constructed in two cascading stages, utilizing distinct 3D representations. The first stage employs Gaussian Splatting for direct user interaction, allowing modifications and guidance of the generative direction at any intermediate step through (i) Adding and Removing components, (ii) Deformable and Rigid Dragging, (iii) Geometric Transformations, and (iv) Semantic Editing. Subsequently, the Gaussian splats are transformed into InstantNGP. We introduce a novel (v) Interactive Hash Refinement module to further add details and extract the geometry in the second stage. Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation. Our project webpage is available at url{https://interactive-3d.github.io/}.

4/26/2024

cs.GR cs.CV

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian

Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without control as the generation process may cause indeterminacy. Aiming at highly enhancing the generation quality, we propose a novel framework named GaussianDreamerPro. The main idea is to bind Gaussians to reasonable geometry, which evolves over the whole generation process. Along different stages of our framework, both the geometry and appearance can be enriched progressively. The final output asset is constructed with 3D Gaussians bound to mesh, which shows significantly enhanced details and quality compared with previous methods. Notably, the generated asset can also be seamlessly integrated into downstream manipulation pipelines, e.g. animation, composition, and simulation etc., greatly promoting its potential in wide applications. Demos are available at https://taoranyi.com/gaussiandreamerpro/.

6/27/2024

cs.CV cs.GR

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu

In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic text or image inputs as well as 3D-aware controls from diverse primitives (multi-view images, voxels, bounding boxes, point clouds, implicit representations, etc). At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), to extract rich 3D priors directly from a diverse range of 3D geometries. Specifically, it adopts neural fields to represent continuous and complete surfaces and uses a geometry generative module with pure transformer blocks in latent space. We present a progressive training scheme to train CLAY on an ultra large 3D model dataset obtained through a carefully designed processing pipeline, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY sets out to produce physically-based rendering (PBR) textures by employing a multi-view material diffusion model that can generate 2K resolution textures with diffuse, roughness, and metallic modalities. We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details. Even first time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity.

6/21/2024

cs.CV

SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models

Abhay Rawat, Shubham Dokania, Astitva Srivastava, Shuaib Ahmed, Haiwen Feng, Rahul Tallamraju

Recent advancements in generative models have unlocked the capabilities to render photo-realistic data in a controllable fashion. Trained on the real data, these generative models are capable of producing realistic samples with minimal to no domain gap, as compared to the traditional graphics rendering. However, using the data generated using such models for training downstream tasks remains under-explored, mainly due to the lack of 3D consistent annotations. Moreover, controllable generative models are learned from massive data and their latent space is often too vast to obtain meaningful sample distributions for downstream task with limited generation. To overcome these challenges, we extract 3D consistent annotations from an existing controllable generative model, making the data useful for downstream tasks. Our experiments show competitive performance against state-of-the-art models using only generated synthetic data, demonstrating potential for solving downstream tasks. Project page: https://synth-forge.github.io

6/13/2024

cs.CV