DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

2404.06903

Published 4/11/2024 by Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

cs.CV cs.AI

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Abstract

The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary flat (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper presents a novel 3D scene generation system called DreamScene360 that can create panoramic 3D scenes from text descriptions.
The system uses a unique panoramic Gaussian splatting technique to generate 3D content that seamlessly integrates with the 360-degree environment.
DreamScene360 allows for unconstrained text-to-3D scene generation, enabling users to describe complex scenes without limitations.

Plain English Explanation

The DreamScene360 system allows users to describe a scene using regular text, and then automatically generates a corresponding 3D panoramic environment. For example, you could type "A cozy cabin in a snowy forest with a roaring fireplace," and the system would create a 3D scene with a cabin, trees, snow, and a fireplace, all viewable in 360 degrees.

This is made possible through a unique Gaussian splatting technique that smoothly integrates the generated 3D content into the surrounding panoramic view. Previous text-to-3D systems were often limited in the types of scenes they could create, but DreamScene360 allows for much more open-ended and detailed descriptions.

The ability to generate fully 3D panoramic scenes from text could have many applications, such as virtual tourism, game development, and architectural visualization. It allows users to easily create immersive 3D environments without the need for 3D modeling expertise.

Technical Explanation

DreamScene360 utilizes a two-stage pipeline to generate the 3D panoramic scenes. First, a text-to-3D model converts the input text description into a collection of 3D objects and their properties. Then, a panoramic Gaussian splatting module seamlessly integrates these 3D elements into a 360-degree environment.

The text-to-3D model is a large language model that has been fine-tuned on a dataset of text descriptions paired with 3D scene data. This allows it to learn the mapping between language and 3D content. The panoramic Gaussian splatting module uses a depth-aware splatting technique to blend the generated 3D objects into the surrounding environment, creating a cohesive and immersive panoramic scene.

Experiments show that DreamScene360 can generate high-quality 3D scenes from a wide range of text descriptions, outperforming previous methods in both qualitative and quantitative evaluations.

Critical Analysis

The DreamScene360 system represents a significant advancement in the field of text-to-3D scene generation, allowing for more open-ended and detailed scene descriptions. However, the paper does note some limitations, such as the system's ability to handle complex object interactions or dynamic scenes.

Additionally, while the panoramic Gaussian splatting technique is a novel contribution, it is not clear how well it would scale to very large or highly complex scenes. The paper also does not address potential biases or safety concerns that could arise from an unconstrained text-to-3D system.

Further research could explore ways to enhance the system's understanding of spatial relationships, object physics, and scene semantics to enable the generation of even more realistic and coherent 3D environments. Incorporating safety and bias mitigation measures would also be an important area for future work.

Conclusion

The DreamScene360 system represents a significant advance in the field of text-to-3D scene generation, enabling users to create immersive panoramic 3D environments from natural language descriptions. The novel panoramic Gaussian splatting technique allows for seamless integration of generated 3D content into the surrounding 360-degree view.

This technology could have widespread applications in areas such as virtual tourism, game development, and architectural visualization, empowering users to easily create detailed 3D scenes without specialized 3D modeling skills. While the system has some limitations, the overall approach demonstrates the potential of language-driven 3D content generation to transform how we interact with and experience virtual environments.

Related Papers

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

Xuening Yuan, Hongyu Yang, Yueming Zhao, Di Huang

Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain. Nevertheless, generating 3D scenes characterized by multiple instances and intricate arrangements remains challenging. In this study, we present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions, leveraging the strong 3D representation capabilities of Gaussian Splatting and the complex arrangement abilities of large language models (LLMs). Our approach involves a 3D Gaussian Guide ($3{DG^2}$) for scene representation, consisting of semantic primitives (objects) and their spatial transformations and relationships derived directly from text prompts using LLMs. This compositional representation allows for local-to-global optimization of the entire scene. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene, which addresses training instability issue arising from simple blending in the subsequent global optimization stage. To mitigate potential biases of LLM priors, we model collision relationships between objects at the global level, enhancing physical correctness and overall realism. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we introduce a sparse initialization and densification strategy. Experiments demonstrate that DreamScape offers high usability and controllability, enabling the generation of high-fidelity 3D scenes from only text prompts and achieving state-of-the-art performance compared to other methods.

4/16/2024

cs.CV

🌐

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang, Huaping Liu

Automatic text-to-3D generation that combines Score Distillation Sampling (SDS) with the optimization of volume rendering has achieved remarkable progress in synthesizing realistic 3D objects. Yet most existing text-to-3D methods by SDS and volume rendering suffer from inaccurate geometry, e.g., the Janus issue, since it is hard to explicitly integrate 3D priors into implicit 3D representations. Besides, it is usually time-consuming for them to generate elaborate 3D models with rich colors. In response, this paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under 3D point cloud diffusion prior along with the ordinary 2D SDS optimization, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative appearance refinement to enrich texture details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D assets with delicate details and accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. Our code is available at https://github.com/gsgen3d/gsgen

4/3/2024

cs.CV

🛸

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik-hang Lee, Pengyuan Zhou

Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .

4/5/2024

cs.CV

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi

We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.

4/11/2024

cs.CV cs.AI cs.GR cs.LG