Quantifying Privacy Risks of Public Statistics to Residents of Subsidized Housing

Read original: arXiv:2407.04776 - Published 7/9/2024 by Ryan Steed, Diana Qing, Zhiwei Steven Wu
Total Score

0

📉

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The U.S. Census Bureau is implementing a new disclosure avoidance system that raises privacy concerns.
  • Researchers explored how this system could allow identification of subsidized households violating occupancy guidelines.
  • Experiments on both public statistics and synthetic data were conducted to assess the privacy risks.
  • The findings provide insights for policymakers seeking an accurate and trustworthy census.

Plain English Explanation

The U.S. Census Bureau is rolling out a new system to protect people's privacy when publishing census data. However, researchers explored how this new system might still allow identifying households living in subsidized housing that have more people than allowed.

By combining public data from the Census and the Department of Housing and Urban Development, the researchers found a simple way to potentially figure out which subsidized households were breaking the rules about how many people can live there. This is a concern because people in subsidized housing may be afraid to report all the people actually living there, in case they get in trouble and lose their housing.

The researchers also tested this attack on synthetic data - data that is artificially created to resemble real census data but with the identities removed. They found that the Census Bureau's old privacy protection method didn't do much to stop this attack, but their newer, more advanced method seemed to be more effective.

Overall, this research provides an important example for policymakers who are trying to make the census both accurate and protect people's privacy.

Technical Explanation

The researchers conducted experiments on both published census statistics and synthetic data to explore a specific privacy concern related to the Census Bureau's new disclosure avoidance system.

They focused on the risk that respondents in subsidized housing may deliberately underreport household members, such as unauthorized children, due to fears of eviction. By combining public data from the Decennial Census and the Department of Housing and Urban Development, the researchers demonstrated a simple, low-cost reconstruction attack that could potentially identify subsidized households violating occupancy guidelines in 2010.

Experiments on synthetic data suggested that the Census Bureau's 2010 random swapping disclosure avoidance mechanism did not significantly reduce the precision of this attack. However, a differentially private mechanism similar to their 2020 system appeared to be more effective at mitigating the privacy risks.

Critical Analysis

The paper provides a valuable example of how even well-intentioned privacy protection measures can still leave vulnerabilities that could be exploited. The researchers acknowledge that their attack relies on certain assumptions, such as the availability of public data sources, that may not always hold true.

Additionally, the paper does not address the broader societal implications of this privacy risk, such as how it could further marginalize low-income populations or discourage participation in the census. Policymakers should consider the potential unintended consequences of disclosure avoidance systems and strive for a balanced approach that prioritizes both accuracy and privacy.

Further research could explore alternative privacy-preserving mechanisms or investigate the prevalence and impact of respondents' fears of eviction or other negative consequences from fully participating in the census.

Conclusion

This research highlights the need for robust, multifaceted privacy protections in the census process. The findings provide a valuable case study for policymakers as they work to ensure the U.S. Census Bureau can collect accurate data while also safeguarding the privacy of respondents, particularly those in vulnerable situations. Ongoing collaboration between researchers, policymakers, and the public will be crucial to developing a trustworthy and inclusive census system.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📉

Total Score

0

Quantifying Privacy Risks of Public Statistics to Residents of Subsidized Housing

Ryan Steed, Diana Qing, Zhiwei Steven Wu

As the U.S. Census Bureau implements its controversial new disclosure avoidance system, researchers and policymakers debate the necessity of new privacy protections for public statistics. With experiments on both published statistics and synthetic data, we explore a particular privacy concern: respondents in subsidized housing may deliberately not mention unauthorized children and other household members for fear of being evicted. By combining public statistics from the Decennial Census and the Department of Housing and Urban Development, we demonstrate a simple, inexpensive reconstruction attack that could identify subsidized households living in violation of occupancy guidelines in 2010. Experiments on synthetic data suggest that a random swapping mechanism similar to the Census Bureau's 2010 disclosure avoidance measures does not significantly reduce the precision of this attack, while a differentially private mechanism similar to the 2020 disclosure avoidance system does. Our results provide a valuable example for policymakers seeking a trustworthy, accurate census.

Read more

7/9/2024

🏋️

Total Score

0

An Examination of the Alleged Privacy Threats of Confidence-Ranked Reconstruction of Census Microdata

David S'anchez, Najeeb Jebreel, Krishnamurty Muralidhar, Josep Domingo-Ferrer, Alberto Blanco-Justicia

The threat of reconstruction attacks has led the U.S. Census Bureau (USCB) to replace in the Decennial Census 2020 the traditional statistical disclosure limitation based on rank swapping with one based on differential privacy (DP), leading to substantial accuracy loss of released statistics. Yet, it has been argued that, if many different reconstructions are compatible with the released statistics, most of them do not correspond to actual original data, which protects against respondent reidentification. Recently, a new attack has been proposed, which incorporates the confidence that a reconstructed record was in the original data. The alleged risk of disclosure entailed by such confidence-ranked reconstruction has renewed the interest of the USCB to use DP-based solutions. To forestall a potential accuracy loss in future releases, we show that the proposed reconstruction is neither effective as a reconstruction method nor conducive to disclosure as claimed by its authors. Specifically, we report empirical results showing the proposed ranking cannot guide reidentification or attribute disclosure attacks, and hence fails to warrant the utility sacrifice entailed by the use of DP to release census statistical data.

Read more

9/18/2024

Synthetic Census Data Generation via Multidimensional Multiset Sum
Total Score

0

Synthetic Census Data Generation via Multidimensional Multiset Sum

Cynthia Dwork, Kristjan Greenewald, Manish Raghavan

The US Decennial Census provides valuable data for both research and policy purposes. Census data are subject to a variety of disclosure avoidance techniques prior to release in order to preserve respondent confidentiality. While many are interested in studying the impacts of disclosure avoidance methods on downstream analyses, particularly with the introduction of differential privacy in the 2020 Decennial Census, these efforts are limited by a critical lack of data: The underlying microdata, which serve as necessary input to disclosure avoidance methods, are kept confidential. In this work, we aim to address this limitation by providing tools to generate synthetic microdata solely from published Census statistics, which can then be used as input to any number of disclosure avoidance algorithms for the sake of evaluation and carrying out comparisons. We define a principled distribution over microdata given published Census statistics and design algorithms to sample from this distribution. We formulate synthetic data generation in this context as a knapsack-style combinatorial optimization problem and develop novel algorithms for this setting. While the problem we study is provably hard, we show empirically that our methods work well in practice, and we offer theoretical arguments to explain our performance. Finally, we verify that the data we produce are close to the desired ground truth.

Read more

4/17/2024

Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting
Total Score

0

Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting

Christian Cianfarani, Aloni Cohen

Data from the Decennial Census is published only after applying a disclosure avoidance system (DAS). Data users were shaken by the adoption of differential privacy in the 2020 DAS, a radical departure from past methods. The change raises the question of whether redistricting law permits, forbids, or requires taking account of the effect of disclosure avoidance. Such uncertainty creates legal risks for redistricters, as Alabama argued in a lawsuit seeking to prevent the 2020 DAS's deployment. We consider two redistricting settings in which a data user might be concerned about the impacts of privacy preserving noise: drawing equal population districts and litigating voting rights cases. What discrepancies arise if the user does nothing to account for disclosure avoidance? How might the user adapt her analyses to mitigate those discrepancies? We study these questions by comparing the official 2010 Redistricting Data to the 2010 Demonstration Data -- created using the 2020 DAS -- in an analysis of millions of algorithmically generated state legislative redistricting plans. In both settings, we observe that an analyst may come to incorrect conclusions if they do not account for noise. With minor adaptations, though, the underlying policy goals remain achievable: tweaking selection criteria enables a redistricter to draw balanced plans, and illustrative plans can still be used as evidence of the maximum number of majority-minority districts that are possible in a geography. At least for state legislatures, Alabama's claim that differential privacy ``inhibits a State's right to draw fair lines'' appears unfounded.

Read more

9/12/2024