Generating geographically and economically realistic large-scale synthetic contact networks: A general method using publicly available data

Read original: arXiv:2406.14698 - Published 6/24/2024 by Alexander Y. Tulchinsky, Fardad Haghpanah, Alisa Hamilton, Nodar Kipshidze, Eili Y. Klein

📊

Overview

The paper describes a method to generate synthetic contact networks for any region in the United States based on publicly available data.
The synthetic contact networks can be used to model the spread of epidemics and social transmission.
The method involves generating a synthetic population of individuals within households, assigning them to workplaces and schools, and then connecting them into a realistic contact network using graph generation algorithms.
The authors test the method on two census regions and show that the synthetic populations accurately reflect the source data, and that the contact networks have distinct properties compared to networks generated without a synthetic population.

Plain English Explanation

The researchers have developed a way to create artificial social networks that mimic the real-world connections between people in a given region. These synthetic contact networks can be used to study how diseases or information might spread through a population.

The key challenge is that detailed data on actual social contacts is hard to come by. To get around this, the researchers used publicly available data like census information, commute patterns, and school enrollment to generate a synthetic population of people living in households, working in workplaces, and attending schools.

Then, they used algorithms to connect these individuals into a realistic social network, reflecting things like people being more likely to interact with those who live nearby or work together.

The researchers found that these synthetic contact networks have some important differences from networks that are generated without starting from a realistic population. For example, they may show different patterns of disease transmission in epidemiological simulations.

By providing open-source software, the researchers are making it easier for others to generate synthetic contact networks for their own regions, which can aid in studying the spread of diseases, information, or other phenomena that depend on social interactions.

Technical Explanation

The paper presents a method to generate synthetic contact networks for any region of the United States based on publicly available data. These networks can be used to model the spread of epidemics and social transmission phenomena.

The key steps of the method are:

Generating a synthetic population: The researchers use combinatorial optimization to generate a synthetic population of individuals within households, based on US census data.
Assigning individuals to workplaces and schools: The researchers assign the individuals to workplaces and schools using commute data, employment statistics, and school enrollment data.
Connecting the population into a contact network: The researchers then use graph generation algorithms to connect the individuals into a realistic contact network, reflecting assortative connections at the geographic and economic levels.

The researchers tested their method on two census regions and found that the synthetic populations accurately reflected the source data. Importantly, they also showed that the resulting contact networks had distinct properties compared to networks generated without a synthetic population, and that these differences affected the rate of disease transmission in epidemiological simulations.

The researchers provide open-source software to generate synthetic populations and contact networks for any area within the US, which can aid in studying the spread of diseases, information, and other social phenomena.

Critical Analysis

The paper presents a robust and well-designed method for generating synthetic contact networks that can be used to model real-world social phenomena. The use of publicly available data to create a realistic synthetic population is a key strength, as is the incorporation of geographic and economic factors into the network generation process.

However, the paper does acknowledge some limitations. The authors note that the method relies on aggregated data sources, which may not capture all the nuances of individual-level social connections. There is also the potential for biases or inaccuracies in the underlying data to be reflected in the synthetic networks.

Additionally, the authors do not explore the potential ethical implications of using synthetic contact networks, such as privacy concerns or the risk of misuse. As these tools become more widely adopted, it will be important to consider these issues and develop appropriate safeguards.

Overall, the research presented in this paper represents an important contribution to the field of network modeling and epidemiological simulation. The open-source software provided by the authors will undoubtedly be a valuable resource for researchers and policymakers studying the spread of diseases, social phenomena, and beyond.

Conclusion

This paper describes a novel method for generating synthetic contact networks that can be used to model the spread of epidemics and social transmission in any region of the United States. By leveraging publicly available data to create realistic synthetic populations and their connections, the researchers have developed a powerful tool for studying a wide range of social and epidemiological phenomena.

The ability to generate customized synthetic contact networks for specific regions is a significant advancement, as it allows researchers and policymakers to better understand the unique social dynamics that may influence the spread of diseases or the diffusion of information within a particular community. As the world continues to grapple with emerging public health challenges and the complexity of social interactions, tools like those presented in this paper will be increasingly valuable for informing decision-making and guiding interventions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Generating geographically and economically realistic large-scale synthetic contact networks: A general method using publicly available data

Alexander Y. Tulchinsky, Fardad Haghpanah, Alisa Hamilton, Nodar Kipshidze, Eili Y. Klein

Synthetic contact networks are useful for modeling epidemic spread and social transmission, but data to infer realistic contact patterns that take account of assortative connections at the geographic and economic levels is limited. We developed a method to generate synthetic contact networks for any region of the United States based on publicly available data. First, we generate a synthetic population of individuals within households from US census data using combinatorial optimization. Then, individuals are assigned to workplaces and schools using commute data, employment statistics, and school enrollment data. The resulting population is then connected into a realistic contact network using graph generation algorithms. We test the method on two census regions and show that the synthetic populations accurately reflect the source data. We further show that the contact networks have distinct properties compared to networks generated without a synthetic population, and that those differences affect the rate of disease transmission in an epidemiological simulation. We provide open-source software to generate a synthetic population and contact network for any area within the US.

6/24/2024

📈

Generating Synthetic Population

Bhavesh Neekhra, Kshitij Kapoor, Debayan Gupta

In this paper, we provide a method to generate synthetic population at various administrative levels for a country like India. This synthetic population is created using machine learning and statistical methods applied to survey data such as Census of India 2011, IHDS-II, NSS-68th round, GPW etc. The synthetic population defines individuals in the population with characteristics such as age, gender, height, weight, home and work location, household structure, preexisting health conditions, socio-economical status, and employment. We used the proposed method to generate the synthetic population for various districts of India. We also compare this synthetic population with source data using various metrics. The experiment results show that the synthetic data can realistically simulate the population for various districts of India.

5/17/2024

📶

Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population

Bhavesh Neekhra, Kshitij Kapoor, Debayan Gupta

Population censuses are vital to public policy decision-making. They provide insight into human resources, demography, culture, and economic structure at local, regional, and national levels. However, such surveys are very expensive (especially for low and middle-income countries with high populations, such as India), time-consuming, and may also raise privacy concerns, depending upon the kinds of data collected. In light of these issues, we introduce SynthPop++, a novel hybrid framework, which can combine data from multiple real-world surveys (with different, partially overlapping sets of attributes) to produce a real-scale synthetic population of humans. Critically, our population maintains family structures comprising individuals with demographic, socioeconomic, health, and geolocation attributes: this means that our ``fake'' people live in realistic locations, have realistic families, etc. Such data can be used for a variety of purposes: we explore one such use case, Agent-based modelling of infectious disease in India. To gauge the quality of our synthetic population, we use both machine learning and statistical metrics. Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India, producing real-scale, detailed data at the desired level of zoom -- from cities, to districts, to states, eventually combining to form a country-scale synthetic population.

5/17/2024

🔄

Effectiveness of probabilistic contact tracing in epidemic containment: the role of super-spreaders and transmission path reconstruction

A. P. Muntoni, F. Mazza, A. Braunstein, G. Catania, L. Dall'Asta

The recent COVID-19 pandemic underscores the significance of early-stage non-pharmacological intervention strategies. The widespread use of masks and the systematic implementation of contact tracing strategies provide a potentially equally effective and socially less impactful alternative to more conventional approaches, such as large-scale mobility restrictions. However, manual contact tracing faces strong limitations in accessing the network of contacts, and the scalability of currently implemented protocols for smartphone-based digital contact tracing becomes impractical during the rapid expansion phases of the outbreaks, due to the surge in exposure notifications and associated tests. A substantial improvement in digital contact tracing can be obtained through the integration of probabilistic techniques for risk assessment that can more effectively guide the allocation of new diagnostic tests. In this study, we first quantitatively analyze the diagnostic and social costs associated with these containment measures based on contact tracing, employing three state-of-the-art models of SARS-CoV-2 spreading. Our results suggest that probabilistic techniques allow for more effective mitigation at a lower cost. Secondly, our findings reveal a remarkable efficacy of probabilistic contact-tracing techniques in performing backward and multi-step tracing and capturing super-spreading events.

9/2/2024