From Counting Stations to City-Wide Estimates: Data-Driven Bicycle Volume Extrapolation

2406.18454

Published 6/27/2024 by Silke K. Kaiser, Nadja Klein, Lynn H. Kaack

🎯

Abstract

Shifting to cycling in urban areas reduces greenhouse gas emissions and improves public health. Street-level bicycle volume information would aid cities in planning targeted infrastructure improvements to encourage cycling and provide civil society with evidence to advocate for cyclists' needs. Yet, the data currently available to cities and citizens often only comes from sparsely located counting stations. This paper extrapolates bicycle volume beyond these few locations to estimate bicycle volume for the entire city of Berlin. We predict daily and average annual daily street-level bicycle volumes using machine-learning techniques and various public data sources. These include app-based crowdsourced data, infrastructure, bike-sharing, motorized traffic, socioeconomic indicators, weather, and holiday data. Our analysis reveals that the best-performing model is XGBoost, and crowdsourced cycling and infrastructure data are most important for the prediction. We further simulate how collecting short-term counts at predicted locations improves performance. By providing ten days of such sample counts for each predicted location to the model, we are able to halve the error and greatly reduce the variability in performance among predicted locations.

Create account to get full access

Overview

The paper focuses on estimating bicycle volume across an entire city using machine learning techniques and various data sources.
Accurately predicting street-level bicycle volume can help cities plan targeted infrastructure improvements to encourage cycling and provide evidence for cycling advocacy.
The study uses data from crowdsourced cycling, infrastructure, bike-sharing, traffic, socioeconomic, weather, and holiday sources to train machine learning models and simulate short-term counts to improve performance.

Plain English Explanation

Cycling in urban areas is a great way to reduce greenhouse gas emissions and improve public health. To encourage more people to cycle, cities need to build the right infrastructure, like bike lanes and bike-sharing stations. But to know where to build this infrastructure, cities need data on how many people are cycling on each street.

Unfortunately, most cities only have bicycle counters at a few locations, so they don't have a complete picture of cycling across the whole city. This paper shows how machine learning can be used to estimate bicycle volume on every street, even where there are no counters.

The researchers used data from a variety of sources, including crowdsourced cycling apps, infrastructure maps, bike-sharing usage, car traffic, weather, and even holidays. They trained machine learning models to predict how many bikes are on each street based on all this data.

The best-performing model was called XGBoost, and the most important data for making accurate predictions was the crowdsourced cycling data and information about the bike infrastructure. The researchers also found that collecting just 10 days of actual bicycle counts at the predicted locations could significantly improve the accuracy of the estimates.

By having a detailed map of bicycle volume across the whole city, cities can make better decisions about where to build bike lanes and other cycling infrastructure. This can encourage more people to cycle, reducing emissions and improving public health. Cycling advocates can also use this data to make the case for investing in cycling infrastructure.

Technical Explanation

The paper presents a method for predicting street-level bicycle volume across an entire city using machine learning techniques and a variety of public data sources.

The researchers collected data from several sources, including:

Crowdsourced cycling data from mobile apps
Infrastructure data like bike lane locations
Bike-sharing system usage
Motorized traffic volumes
Socioeconomic indicators
Weather data
Holiday calendars

They then trained several machine learning models, including linear regression, random forest, and XGBoost, to predict daily and average annual daily bicycle volumes on each street in Berlin. The XGBoost model demonstrated the best performance.

The analysis revealed that the crowdsourced cycling data and infrastructure data were the most important predictors for the model. The researchers also simulated the impact of collecting short-term (10-day) bicycle counts at the predicted high-volume locations. By incorporating these short-term counts, they were able to halve the error in their predictions and reduce the variability in performance across different street locations.

Critical Analysis

The paper presents a promising approach for estimating bicycle volume at a citywide scale, which could significantly aid urban planning and cycling advocacy efforts. However, the researchers acknowledge several limitations:

The model was trained and evaluated only on data from Berlin, so its performance may not generalize to other cities with different cycling cultures and infrastructure.
The crowdsourced cycling data used likely under-represents certain demographics, such as older or less tech-savvy cyclists.
The short-term count simulation assumes the ability to collect data at specific locations, which may be logistically challenging for many cities.

Additionally, the paper does not address potential privacy concerns around the use of crowdsourced location data or the security of the predictive models. Further research could explore these issues, as well as the scalability of the approach to larger metropolitan areas.

Overall, this research demonstrates the value of leveraging diverse data sources and advanced analytics to better understand and support cycling in urban environments. By making this information more accessible, cities and civil society can work together to create safer, more sustainable, and more equitable transportation systems.

Conclusion

This paper presents a novel method for estimating bicycle volume across an entire city using machine learning and a variety of data sources. The insights gained from this approach can help cities make more informed decisions about cycling infrastructure investments and provide evidence to support cycling advocacy efforts.

While the study is limited to Berlin, the general framework could be applied in other urban contexts to support the shift towards more sustainable and healthy transportation modes. As cities around the world work to reduce greenhouse gas emissions and improve public health, tools like this one can play a crucial role in guiding the way forward.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📊

Modeling Large-Scale Walking and Cycling Networks: A Machine Learning Approach Using Mobile Phone and Crowdsourced Data

Meead Saberi, Tanapon Lilasathapornkit

Walking and cycling are known to bring substantial health, environmental, and economic advantages. However, the development of evidence-based active transportation planning and policies has been impeded by significant data limitations, such as biases in crowdsourced data and representativeness issues of mobile phone data. In this study, we develop and apply a machine learning based modeling approach for estimating daily walking and cycling volumes across a large-scale regional network in New South Wales, Australia that includes 188,999 walking links and 114,885 cycling links. The modeling methodology leverages crowdsourced and mobile phone data as well as a range of other datasets on population, land use, topography, climate, etc. The study discusses the unique challenges and limitations related to all three aspects of model training, testing, and inference given the large geographical extent of the modeled networks and relative scarcity of observed walking and cycling count data. The study also proposes a new technique to identify model estimate outliers and to mitigate their impact. Overall, the study provides a valuable resource for transportation modelers, policymakers and urban planners seeking to enhance active transportation infrastructure planning and policies with advanced emerging data-driven modeling methodologies.

4/4/2024

cs.LG cs.NA

Cycling into the workshop: predictive maintenance for Barcelona's bike-sharing system

Jordi Grau-Escolano, Aleix Bassolas, Julian Vicens

Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona's bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system's predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.

4/29/2024

cs.CY cs.LG

🎲

Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling

Tara Kelly, Jessica Gupta

Traffic congestion at intersections is a significant issue in urban areas, leading to increased commute times, safety hazards, and operational inefficiencies. This study aims to develop a predictive model for congestion at intersections in major U.S. cities, utilizing a dataset of trip-logging metrics from commercial vehicles across 4,800 intersections. The dataset encompasses 27 features, including intersection coordinates, street names, time of day, and traffic metrics (Kashyap et al., 2019). Additional features, such as rainfall/snowfall percentage, distance from downtown and outskirts, and road types, were incorporated to enhance the model's predictive power. The methodology involves data exploration, feature transformation, and handling missing values through low-rank models and label encoding. The proposed model has the potential to assist city planners and governments in anticipating traffic hot spots, optimizing operations, and identifying infrastructure challenges.

4/24/2024

cs.LG

Improving Demand Forecasting in Open Systems with Cartogram-Enhanced Deep Learning

Sangjoon Park, Yongsung Kwon, Hyungjoon Soh, Mi Jin Lee, Seung-Woo Son

Predicting temporal patterns across various domains poses significant challenges due to their nuanced and often nonlinear trajectories. To address this challenge, prediction frameworks have been continuously refined, employing data-driven statistical methods, mathematical models, and machine learning. Recently, as one of the challenging systems, shared transport systems such as public bicycles have gained prominence due to urban constraints and environmental concerns. Predicting rental and return patterns at bicycle stations remains a formidable task due to the system's openness and imbalanced usage patterns across stations. In this study, we propose a deep learning framework to predict rental and return patterns by leveraging cartogram approaches. The cartogram approach facilitates the prediction of demand for newly installed stations with no training data as well as long-period prediction, which has not been achieved before. We apply this method to public bicycle rental-and-return data in Seoul, South Korea, employing a spatial-temporal convolutional graph attention network. Our improved architecture incorporates batch attention and modified node feature updates for better prediction accuracy across different time scales. We demonstrate the effectiveness of our framework in predicting temporal patterns and its potential applications.

5/28/2024

cs.LG