Using Images as Covariates: Measuring Curb Appeal with Deep Learning

Read original: arXiv:2403.19915 - Published 4/1/2024 by Ardyn Nordstrom, Morgan Nordstrom, Matthew D. Webb

🤿

Introduction

The paper discusses using deep learning methods to capture information from unstructured data like images, which can help predict economic outcomes more accurately. Specifically, the authors aim to improve housing price predictions by using convolutional neural networks to extract visual features from photos of home exteriors. These features can capture unobservable attributes that influence a property's perceived worth but are not included in structured real estate data.

The authors compare three modeling approaches: ordinary least squares regression, neural networks, and a combined model using neural network predictions as inputs to a linear regression. They use multiple pre-trained neural network architectures (encoders) to identify visual features from the home photos, rather than relying on a single encoder like previous studies.

By incorporating visual information extracted through deep learning, the authors demonstrate improvements of around 3% in out-of-sample housing price prediction accuracy compared to traditional hedonic pricing models based solely on structured data. Their work builds on prior applications of deep learning in economics that used satellite imagery or other unstructured data sources.

The paper contributes to the growing literature on utilizing novel data types and machine learning techniques to capture previously unobservable factors influencing economic outcomes. Similar approaches have been used to analyze vocal tones, facial expressions, and text data from Federal Reserve communications to gauge emotions and their impacts on financial markets.

Data

The property dataset used for this research consists of detached homes sold in Toronto between December 2018 and February 2020. It includes details such as the number of bedrooms, bathrooms, HVAC system, basement and garage finishings, list price, sale price, and dates. The sold prices ranged widely, from under $400,000 to over $13 million, with an average selling price of around $1.48 million (adjusted for inflation to January 2020).

The dataset covers a diverse range of properties, with variations in sale prices based on amenities present. The average home had 3.34 bedrooms and 3.06 bathrooms, both positively correlated with sale price. Two-story homes were the most common (45%), followed by bungalows (33%), suggesting moderately sized, family-oriented living spaces dominated the sample.

Features like finished or walk-out basements, attached garages, central air conditioning, or basement apartments were all positively correlated with sale price, as expected from the hedonic literature. These control variables were included in some analyses.

Each property was geocoded to a neighborhood and postal code. Historic listings for each property were accessed, and the primary image showing the front of the house was saved, resulting in a dataset of 6,887 addresses with both image and property characteristic data.

Methods

The methodology involves using multiple neural network encoders to identify features within images of houses. These encoded features, along with listing attributes from the Multiple Listing Service (MLS), are then used to predict housing prices using three different modeling approaches:

Ordinary Least Squares (OLS) regression with LASSO penalization to handle many potential predictors.
Neural networks with three intermediate layers to predict prices directly from the encoded image features and listing attributes.
A "convoluted" approach that first uses a neural network to predict prices from encoded features, then takes that predicted price along with listing attributes and runs an OLS regression to obtain the final price prediction.

The different encoder models like ResNet50, VGG16, etc. capture distinct aspects of the images, so the methodology explores combining these encodings in various ways to see if that improves predictive performance compared to using single encoders.

To assess the models, the actual sales prices are regressed on the predicted prices from the models. Mean squared errors are also calculated using cross-validation to evaluate out-of-sample forecast accuracy with and without the image data as predictors.

The key innovation is leveraging neural networks to extract information from images to augment traditional housing attributes in predicting sales prices.

Results with an Application to Housing Price Predictions

The paper discusses the results of using images of houses to predict their sale prices. It explores whether image data contains useful information and if it can improve price predictions when combined with structured data like the number of bedrooms and bathrooms.

The analysis shows that image data alone can explain some variation in sale prices, with predictions from image encoders like CNNs being statistically significant even without any other housing attributes. Combining multiple encoders leads to better predictions than using a single encoder.

When housing attributes from the MLS listings are added along with image data, the predictive performance improves substantially. The best out-of-sample predictions come from models that utilize both image encoders and structured MLS data together.

The paper tests different modeling approaches - penalized OLS regressions, neural networks, and a "convoluted" model combining neural nets and OLS. Across these, incorporating both image and non-image data generally minimizes prediction errors compared to using either data source alone.

The authors hypothesize that combining multiple image encoders works well because each encoder captures distinct, non-overlapping features of the images, rather than being highly correlated representations. Overall, the results suggest images can provide complementary predictive information beyond traditional housing attributes for estimating home values.

Conclusions

The paper demonstrates how deep learning can extract valuable information from images, which is typically unobservable in traditional data sources. Specifically, it shows that using deep learning on images of house exteriors can improve predictions of house prices, as visual details influence buyers' perceptions but are not captured in typical housing data.

The study focuses on single-family homes, so the findings may not directly apply to other types of real estate. The main contribution is showing that deep learning, particularly using multiple image encoders, can make image data useful in econometric analyses.

Unlike previous work using one encoder, the paper shows that using several deep learning models can capture more information from images than a single encoder. Different encoders like Inception Network, ResNet50, and MobileNet have unique architectures and training datasets, allowing them to detect different visual features such as scale, complexity, texture, and detail.

Combining multiple encoders in the analysis led to significant improvements in model fit and a 3% increase in out-of-sample prediction accuracy for house prices.

The paper does not attempt to establish causal relationships between image content and housing prices, but rather uses images as covariates. It contributes to the growing use of deep learning in economics to capture previously unobservable information from unstructured data sources like images.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

Using Images as Covariates: Measuring Curb Appeal with Deep Learning

Ardyn Nordstrom, Morgan Nordstrom, Matthew D. Webb

This paper details an innovative methodology to integrate image data into traditional econometric models. Motivated by forecasting sales prices for residential real estate, we harness the power of deep learning to add information contained in images as covariates. Specifically, images of homes were categorized and encoded using an ensemble of image classifiers (ResNet-50, VGG16, MobileNet, and Inception V3). Unique features presented within each image were further encoded through panoptic segmentation. Forecasts from a neural network trained on the encoded data results in improved out-of-sample predictive power. We also combine these image-based forecasts with standard hedonic real estate property and location characteristics, resulting in a unified dataset. We show that image-based forecasts increase the accuracy of hedonic forecasts when encoded features are regarded as additional covariates. We also attempt to explain which covariates the image-based forecasts are most highly correlated with. The study exemplifies the benefits of interdisciplinary methodologies, merging machine learning and econometrics to harness untapped data sources for more accurate forecasting.

4/1/2024

A Multi-Modal Deep Learning Based Approach for House Price Prediction

Md Hasebul Hasan, Md Abid Jahan, Mohammed Eunus Ali, Yuan-Fang Li, Timos Sellis

Accurate prediction of house price, a vital aspect of the residential real estate sector, is of substantial interest for a wide range of stakeholders. However, predicting house prices is a complex task due to the significant variability influenced by factors such as house features, location, neighborhood, and many others. Despite numerous attempts utilizing a wide array of algorithms, including recent deep learning techniques, to predict house prices accurately, existing approaches have fallen short of considering a wide range of factors such as textual and visual features. This paper addresses this gap by comprehensively incorporating attributes, such as features, textual descriptions, geo-spatial neighborhood, and house images, typically showcased in real estate listings in a house price prediction system. Specifically, we propose a multi-modal deep learning approach that leverages different types of data to learn more accurate representation of the house. In particular, we learn a joint embedding of raw house attributes, geo-spatial neighborhood, and most importantly from textual description and images representing the house; and finally use a downstream regression model to predict the house price from this jointly learned embedding vector. Our experimental results with a real-world dataset show that the text embedding of the house advertisement description and image embedding of the house pictures in addition to raw attributes and geo-spatial embedding, can significantly improve the house price prediction accuracy. The relevant source code and dataset are publicly accessible at the following URL: https://github.com/4P0N/mhpp

9/10/2024

🤿

Scalable Property Valuation Models via Graph-based Deep Learning

Enrique Riveros, Carla Vairetti, Christian Wegmann, Santiago Truffa, Sebasti'an Maldonado

This paper aims to enrich the capabilities of existing deep learning-based automated valuation models through an efficient graph representation of peer dependencies, thus capturing intricate spatial relationships. In particular, we develop two novel graph neural network models that effectively identify sequences of neighboring houses with similar features, employing different message passing algorithms. The first strategy consider standard spatial graph convolutions, while the second one utilizes transformer graph convolutions. This approach confers scalability to the modeling process. The experimental evaluation is conducted using a proprietary dataset comprising approximately 200,000 houses located in Santiago, Chile. We show that employing tailored graph neural networks significantly improves the accuracy of house price prediction, especially when utilizing transformer convolutional message passing layers.

5/13/2024

Deep Learning for Economists

Melissa Dell

Deep learning provides powerful methods to impute structured information from large-scale, unstructured text and image datasets. For example, economists might wish to detect the presence of economic activity in satellite images, or to measure the topics or entities mentioned in social media, the congressional record, or firm filings. This review introduces deep neural networks, covering methods such as classifiers, regression models, generative AI, and embedding models. Applications include classification, document digitization, record linkage, and methods for data exploration in massive scale text and image corpora. When suitable methods are used, deep learning models can be cheap to tune and can scale affordably to problems involving millions or billions of data points.. The review is accompanied by a companion website, EconDL, with user-friendly demo notebooks, software resources, and a knowledge base that provides technical details and additional applications.

9/18/2024