PolyRoom: Room-aware Transformer for Floorplan Reconstruction

Read original: arXiv:2407.10439 - Published 7/16/2024 by Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

PolyRoom: Room-aware Transformer for Floorplan Reconstruction

Overview

This paper introduces PolyRoom, a novel transformer-based model for reconstructing 2D floorplans from input images.
The key innovations include a room-aware encoder that explicitly models room topology, and a polygon-based decoder that outputs room boundaries as polygons.
Experiments show PolyRoom outperforms prior approaches on various floorplan reconstruction benchmarks.

Plain English Explanation

PolyRoom is a new AI model that can take an image of a building's interior and reconstruct a detailed 2D floorplan from it. This is a challenging task, as floorplans involve complex room shapes and layouts that can be difficult for computers to understand.

The core idea behind PolyRoom is to give the model a better understanding of how rooms are arranged and connected to each other. Rather than just trying to detect individual walls, doors, and other elements, PolyRoom has a "room-aware" encoder that models the overall room topology. This helps it better capture the full structure of the floorplan.

PolyRoom: Room-aware Transformer for Floorplan Reconstruction then uses a specialized "polygon-based" decoder to output the room boundaries as clean, continuous polygons. This produces high-quality floorplan drawings, rather than just a jumble of detected walls.

Overall, PolyRoom represents an advance in using AI and machine learning to automatically reconstruct detailed 2D floorplans from visual inputs. This could have applications in areas like real estate, architecture, and interior design, where accurate digital floorplans are valuable.

Technical Explanation

The key technical innovations in PolyRoom: Room-aware Transformer for Floorplan Reconstruction are:

Room-aware Encoder: Unlike prior approaches that focused on detecting individual walls and other elements, PolyRoom's encoder explicitly models the room topology. It uses a transformer-based architecture to capture the relationships between different rooms and how they are connected.
Polygon-based Decoder: Rather than just outputting a set of detected walls, PolyRoom's decoder generates the room boundaries as clean, continuous polygons. This produces higher-quality, more professional-looking floorplan drawings.
End-to-End Training: PolyRoom is trained end-to-end, directly mapping input images to output floorplan polygons, without requiring intermediate supervision or annotations.

The authors evaluate PolyRoom on several floorplan reconstruction benchmarks, including FRI-Net, QuestMaps, and ViewFormer. They show that PolyRoom outperforms prior state-of-the-art methods in terms of both reconstruction accuracy and visual quality of the output floorplans.

Critical Analysis

One potential limitation of PolyRoom is that it relies on having a single, high-quality input image of the interior space. In real-world scenarios, floorplan reconstruction may need to handle more diverse inputs, such as multiple views, sketches, or partial information. The authors do not explore how PolyRoom would perform in these more challenging settings.

Additionally, the paper does not provide much insight into the training process or architectural details of the model. It would be helpful to understand the specific design choices, hyperparameters, and training strategies that enabled PolyRoom to achieve its strong performance.

Finally, while the authors demonstrate PolyRoom's effectiveness on standard benchmarks, it would be valuable to see how the model performs on real-world floorplan data, potentially in collaboration with industry partners. This could uncover additional challenges or requirements not captured by the existing datasets.

Conclusion

PolyRoom: Room-aware Transformer for Floorplan Reconstruction represents an important advance in the field of automated floorplan reconstruction. By explicitly modeling room topology and generating polygonal room boundaries, PolyRoom can produce high-quality, professional-looking floorplans from input images.

As AI continues to make progress in areas like computer vision and spatial reasoning, tools like PolyRoom could have significant real-world impact in industries like architecture, interior design, and real estate, where accurate digital floorplans are essential. Further research and collaboration with industry partners could help unlock the full potential of this technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PolyRoom: Room-aware Transformer for Floorplan Reconstruction

Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inaccuracies in corner positions or angles, self-intersecting or overlapping polygons, and potentially implausible topology. To tackle these challenges, we present PolyRoom, a room-aware Transformer that leverages uniform sampling representation, room-aware query initialization, and room-aware self-attention for floorplan reconstruction. Specifically, we adopt a uniform sampling floorplan representation to enable dense supervision during training and effective utilization of angle information. Additionally, we propose a room-aware query initialization scheme to prevent non-polygonal sequences and introduce room-aware self-attention to enhance memory efficiency and model performance. Experimental results on two widely used datasets demonstrate that PolyRoom surpasses current state-of-the-art methods both quantitatively and qualitatively. Our code is available at: https://github.com/3dv-casia/PolyRoom/.

7/16/2024

FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation

Honghao Xu, Juzhan Xu, Zeyu Huang, Pengfei Xu, Hui Huang, Ruizhen Hu

In this paper, we introduce a novel method called FRI-Net for 2D floorplan reconstruction from 3D point cloud. Existing methods typically rely on corner regression or box regression, which lack consideration for the global shapes of rooms. To address these issues, we propose a novel approach using a room-wise implicit representation with structural regularization to characterize the shapes of rooms in floorplans. By incorporating geometric priors of room layouts in floorplans into our training strategy, the generated room polygons are more geometrically regular. We have conducted experiments on two challenging datasets, Structured3D and SceneCAD. Our method demonstrates improved performance compared to state-of-the-art methods, validating the effectiveness of our proposed representation for floorplan reconstruction.

7/16/2024

Self-training Room Layout Estimation via Geometry-aware Ray-casting

Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Jonathan Lee, Yi-Hsuan Tsai, Min Sun

In this paper, we introduce a novel geometry-aware self-training framework for room layout estimation models on unseen scenes with unlabeled data. Our approach utilizes a ray-casting formulation to aggregate multiple estimates from different viewing positions, enabling the computation of reliable pseudo-labels for self-training. In particular, our ray-casting approach enforces multi-view consistency along all ray directions and prioritizes spatial proximity to the camera view for geometry reasoning. As a result, our geometry-aware pseudo-labels effectively handle complex room geometries and occluded walls without relying on assumptions such as Manhattan World or planar room walls. Evaluation on publicly available datasets, including synthetic and real-world scenarios, demonstrates significant improvements in current state-of-the-art layout models without using any human annotation.

7/23/2024

QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

Yash Mehan, Kumaraditya Gupta, Rohit Jayanti, Anirudh Govil, Sourav Garg, Madhava Krishna

Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction. Robotic tasks such as planning and navigation require a semantic understanding of the scene as well. This is typically achieved via object-level semantic segmentation. However, such methods struggle to segment out topological regions like kitchen in the scene. In this work, we introduce a two-step pipeline. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self-attention transformer. Our language-topology alignment supports natural language querying, e.g., a place to cook locates the kitchen. We outperform the current state-of-the-art on room segmentation by ~20% and room classification by ~12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding.

4/10/2024