BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics

Read original: arXiv:2406.08990 - Published 6/19/2024 by Arian Prabowo, Xiachong Lin, Imran Razzak, Hao Xue, Emily W. Yap, Matthew Amos, Flora D. Salim

BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics

Overview

This paper introduces BTS (Building Timeseries Dataset), a large-scale dataset containing diverse timeseries data from commercial and residential buildings.
BTS aims to enable and empower large-scale building analytics, including applications in building energy management, fault detection, and occupancy prediction.
The dataset includes over 12 million data points from 1,200 buildings across 6 countries, covering various sensor modalities such as temperature, humidity, occupancy, and energy consumption.

Plain English Explanation

The BTS (Building Timeseries Dataset) is a comprehensive collection of data from thousands of buildings around the world. It includes detailed information on things like temperature, humidity, how many people are in the buildings, and how much energy the buildings are using. This dataset is designed to help researchers and companies improve the way buildings are managed and operated, which can save energy and make them more comfortable for the people inside.

By having access to this large and diverse dataset, researchers can develop new machine learning models and analytics tools to better understand building performance, detect problems, and predict future energy needs. This could lead to more efficient buildings that are better for the environment and more comfortable for the people who use them.

The dataset includes data from over 1,200 buildings across 6 different countries, covering a wide range of building types, sizes, and uses. This diversity is important because it allows the insights gained from the data to be more broadly applicable, rather than just focused on a small subset of buildings.

Overall, the BTS dataset represents a valuable resource for advancing the field of building energy management and architectural design, with the potential to drive significant improvements in the sustainability and livability of the built environment.

Technical Explanation

The BTS (Building Timeseries Dataset) is a large-scale dataset designed to support and empower advanced analytics for commercial and residential buildings. The dataset contains over 12 million data points from 1,200 buildings across 6 countries, covering a diverse range of sensor modalities including temperature, humidity, occupancy, and energy consumption.

The key innovation of BTS is its scale and diversity, which allows for the development of more robust and generalizable machine learning models and analytics tools. By providing access to data from a wide variety of building types, sizes, and geographic locations, the dataset enables researchers to uncover insights that are broadly applicable, rather than limited to a specific subset of buildings.

The data collection process involved deploying a network of sensors in the participating buildings, which continuously recorded the various environmental and operational parameters over an extended period. The resulting timeseries data was then curated, harmonized, and anonymized to protect the privacy of building occupants.

In addition to the raw sensor data, BTS also includes contextual information about each building, such as its size, age, and primary use. This metadata can be used to investigate the relationships between building characteristics and energy performance, helping to uncover the "mystery of global buildings".

The dataset is designed to support a wide range of applications in building analytics, including energy management, fault detection, and occupancy prediction. Researchers can use the data to develop and validate new machine learning models that can be deployed in real-world building management systems, leading to improved efficiency, comfort, and sustainability.

Critical Analysis

The BTS dataset represents a significant contribution to the field of building analytics, providing a valuable resource for researchers and practitioners. However, the paper also acknowledges several limitations and areas for future work.

One potential concern is the geographic bias in the dataset, as the majority of the buildings are located in a few countries. While the diversity of building types is impressive, expanding the dataset to include more global representation could further enhance its utility and generalizability.

Additionally, the paper notes that the dataset only covers a limited timeframe, which may limit the ability to study long-term trends and seasonal variations in building performance. Extending the data collection period or incorporating historical data could address this limitation.

Another area for improvement could be the inclusion of more detailed metadata, such as building construction materials, systems, and operational schedules. This additional contextual information could enable more nuanced analyses and the development of more sophisticated predictive models.

Finally, while the dataset is publicly available, the paper does not provide details on the specific data access and usage policies. Ensuring transparent and equitable access to the dataset will be crucial for maximizing its impact and fostering a collaborative research community.

Conclusion

The BTS (Building Timeseries Dataset) represents a significant advancement in the field of building analytics, providing researchers and practitioners with a large-scale, diverse, and high-quality dataset to drive innovation. By enabling the development of more robust and generalizable machine learning models, the BTS dataset has the potential to lead to significant improvements in building energy management, fault detection, and occupancy prediction - ultimately contributing to a more sustainable and livable built environment.

The open availability of the BTS dataset will foster a collaborative research community, encouraging the development of new techniques and applications that can be directly applied to real-world building management challenges. As the dataset continues to grow and evolve, it will become an increasingly valuable resource for advancing the state-of-the-art in building analytics and contributing to a more energy-efficient and occupant-centric future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics

Arian Prabowo, Xiachong Lin, Imran Razzak, Hao Xue, Emily W. Yap, Matthew Amos, Flora D. Salim

Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics. Access to the dataset and the code used for benchmarking are available here: https://github.com/cruiseresearchgroup/DIEF_BTS .

6/19/2024

📊

Automated Real-World Sustainability Data Generation from Images of Buildings

Peter J Bentley, Soo Ling Lim, Rajat Mathur, Sid Narang

When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data method with a ground truth comprising real building data for 47 apartments and achieve accuracy better than a human performing the same task. We also demonstrate that the method can generate tailored recommendations to the owner on how best to improve their properties and discuss methods to scale the approach.

8/29/2024

⚙️

A Gap in Time: The Challenge of Processing Heterogeneous IoT Point Data in Buildings

Xiachong Lin, Arian Prabowo, Imran Razzak, Hao Xue, Matthew Amos, Sam Behrens, Stephen White, Flora D. Salim

The growing need for sustainable energy solutions has driven the integration of digitalized buildings into the power grid, utilizing Internet-of-Things technology to optimize building performance and energy efficiency. However, incorporating IoT point data within deep-learning frameworks for energy management presents a complex challenge, predominantly due to the inherent data heterogeneity. This paper comprehensively analyzes the multifaceted heterogeneity present in real-world building IoT data streams. We meticulously dissect the heterogeneity across multiple dimensions, encompassing ontology, etiology, temporal irregularity, spatial diversity, and their combined effects on the IoT point data distribution. In addition, experiments using state-of-the-art forecasting models are conducted to evaluate their impacts on the performance of deep-learning models for forecasting tasks. By charting the diversity along these dimensions, we illustrate the challenges and delineate pathways for future research to leverage this heterogeneity as a resource rather than a roadblock. This exploration sets the stage for advancing the predictive abilities of deep-learning algorithms and catalyzing the evolution of intelligent energy-efficient buildings.

5/24/2024

New!Predicting building types and functions at transnational scale

Jonas Fill, Michael Eichelbeck, Michael Ebner

Building-specific knowledge such as building type and function information is important for numerous energy applications. However, comprehensive datasets containing this information for individual households are missing in many regions of Europe. For the first time, we investigate whether it is feasible to predict building types and functional classes at a European scale based on only open GIS datasets available across countries. We train a graph neural network (GNN) classifier on a large-scale graph dataset consisting of OpenStreetMap (OSM) buildings across the EU, Norway, Switzerland, and the UK. To efficiently perform training using the large-scale graph, we utilize localized subgraphs. A graph transformer model achieves a high Cohen's kappa coefficient of 0.754 when classifying buildings into 9 classes, and a very high Cohen's kappa coefficient of 0.844 when classifying buildings into the residential and non-residential classes. The experimental results imply three core novel contributions to literature. Firstly, we show that building classification across multiple countries is possible using a multi-source dataset consisting of information about 2D building shape, land use, degree of urbanization, and countries as input, and OSM tags as ground truth. Secondly, our results indicate that GNN models that consider contextual information about building neighborhoods improve predictive performance compared to models that only consider individual buildings and ignore the neighborhood. Thirdly, we show that training with GNNs on localized subgraphs instead of standard GNNs improves performance for the task of building classification.

9/17/2024