Insights from an experiment crowdsourcing data from thousands of US Amazon users: The importance of transparency, money, and data use

2404.13172

Published 5/15/2024 by Alex Berke, Robert Mahari, Sandy Pentland, Kent Larson, D. Calacci

Insights from an experiment crowdsourcing data from thousands of US Amazon users: The importance of transparency, money, and data use

Abstract

Data generated by users on digital platforms are a crucial resource for advocates and researchers interested in uncovering digital inequities, auditing algorithms, and understanding human behavior. Yet data access is often restricted. How can researchers both effectively and ethically collect user data? This paper shares an innovative approach to crowdsourcing user data to collect otherwise inaccessible Amazon purchase histories, spanning 5 years, from more than 5000 US users. We developed a data collection tool that prioritizes participant consent and includes an experimental study design. The design allows us to study multiple aspects of privacy perception and data sharing behavior. Experiment results (N=6325) reveal both monetary incentives and transparency can significantly increase data sharing. Age, race, education, and gender also played a role, where female and less-educated participants were more likely to share. Our study design enables a unique empirical evaluation of the privacy paradox, where users claim to value their privacy more than they do in practice. We set up both real and hypothetical data sharing scenarios and find measurable similarities and differences in share rates across these contexts. For example, increasing monetary incentives had a 6 times higher impact on share rates in real scenarios. In addition, we study participants' opinions on how data should be used by various third parties, again finding demographics have a significant impact. Notably, the majority of participants disapproved of government agencies using purchase data yet the majority approved of use by researchers. Overall, our findings highlight the critical role that transparency, incentive design, and user demographics play in ethical data collection practices, and provide guidance for future researchers seeking to crowdsource user generated data.

Create account to get full access

Overview

This paper describes an experiment that crowdsourced data from thousands of US Amazon users to gain insights into the importance of transparency, money, and data use.
The experiment examined how users responded to different approaches to data collection and compensation.
The findings provide valuable insights for researchers and companies on how to effectively engage users in data-driven initiatives while addressing privacy concerns.

Plain English Explanation

This research paper explores an experiment that gathered data from a large number of Amazon users in the United States. The goal was to understand what factors influence people's willingness to share their personal information, such as transparency around how the data will be used, the amount of money offered in exchange, and how the data will be protected and utilized.

The researchers designed the experiment to test different approaches to incentivizing people to share their data and being upfront about how that data would be used. For example, some participants were offered more money in exchange for their data, while others were given detailed information about how their data would be handled and protected.

By analyzing how the thousands of Amazon users responded to these different scenarios, the researchers were able to identify the key factors that influence people's willingness to share their personal information. The findings suggest that being transparent about data usage, providing fair monetary compensation, and addressing concerns around data authenticity and consent are all important for encouraging people to participate in data-driven initiatives.

These insights are valuable for researchers, companies, and anyone working with personal data who want to balance the need for data with the need to respect people's privacy and autonomy.

Technical Explanation

The researchers conducted a large-scale experiment with thousands of US Amazon users to investigate the factors that influence people's willingness to share their personal data. The experiment involved presenting participants with different scenarios that varied in terms of the transparency around data usage, the monetary compensation offered, and the assurances provided about data protection and usage.

By analyzing the participants' responses, the researchers were able to identify several key insights. First, they found that being transparent about how the data will be used is crucial - participants were much more willing to share their information when they were given clear and detailed information about the data usage. Secondly, the amount of monetary compensation offered also played a significant role, with higher payments leading to increased willingness to share data. Finally, the researchers observed that providing strong assurances about data protection and responsible usage was also an important factor in encouraging data sharing.

These findings have important implications for researchers, companies, and others who collect personal data. They suggest that a combination of transparency, fair compensation, and robust data governance practices are essential for building trust and encouraging people to participate in data-driven initiatives.

Critical Analysis

The researchers acknowledge several limitations to their study. First, the experiment was conducted solely with US Amazon users, so the findings may not necessarily generalize to other populations or cultural contexts. Additionally, the researchers note that the study focused on self-reported willingness to share data, rather than actual data-sharing behaviors, which may differ.

Another potential issue is that the study did not explore the long-term effects of the different approaches tested. It's possible that factors like transparency and compensation may have different impacts over time as people's perceptions and trust evolve.

Furthermore, the research did not delve into the nuances of how different types of data (e.g., sensitive vs. non-sensitive) or data usage (e.g., commercial vs. research) may influence people's willingness to share. These are important considerations that warrant further investigation.

Despite these limitations, the study provides valuable insights that can inform the design of more ethical and effective data-driven initiatives. By prioritizing transparency, fair compensation, and responsible data practices, researchers and companies can build trust and encourage more meaningful participation from the people whose data they seek to collect and use.

Conclusion

This large-scale experiment with thousands of US Amazon users offers important lessons for researchers, companies, and others engaged in data-driven initiatives. The findings highlight the critical importance of transparency, fair monetary compensation, and robust data governance practices for encouraging people to share their personal information.

By addressing these key factors, data collectors and users can build trust, respect people's privacy and autonomy, and foster more meaningful and ethical participation in the data economy. The insights from this study can help guide the development of data-driven initiatives that better align with the needs and concerns of the individuals whose data is being collected and used.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤖

Human participants in AI research: Ethics and transparency in practice

Kevin R. McKee

In recent years, research involving human participants has been critical to advances in artificial intelligence (AI) and machine learning (ML), particularly in the areas of conversational, human-compatible, and cooperative AI. For example, around 12% and 6% of publications at recent AAAI and NeurIPS conferences indicate the collection of original human data, respectively. Yet AI and ML researchers lack guidelines for ethical, transparent research practices with human participants. Fewer than one out of every four of these AAAI and NeurIPS papers provide details of ethical review, the collection of informed consent, or participant compensation. This paper aims to bridge this gap by exploring normative similarities and differences between AI research and related fields that involve human participants. Though psychology, human-computer interaction, and other adjacent fields offer historic lessons and helpful insights, AI research raises several specific concerns$unicode{x2014}$namely, participatory design, crowdsourced dataset development, and an expansive role of corporations$unicode{x2014}$that necessitate a contextual ethics framework. To address these concerns, this paper outlines a set of guidelines for ethical and transparent practice with human participants in AI and ML research. These guidelines can be found in Section 4 on pp. 4$unicode{x2013}$7.

4/23/2024

cs.CY

🌿

Algorithmic Transparency and Participation through the Handoff Lens: Lessons Learned from the U.S. Census Bureau's Adoption of Differential Privacy

Amina A. Abdu, Lauren M. Chambers, Deirdre K. Mulligan, Abigail Z. Jacobs

Emerging discussions on the responsible government use of algorithmic technologies propose transparency and public participation as key mechanisms for preserving accountability and trust. But in practice, the adoption and use of any technology shifts the social, organizational, and political context in which it is embedded. Therefore translating transparency and participation efforts into meaningful, effective accountability must take into account these shifts. We adopt two theoretical frames, Mulligan and Nissenbaum's handoff model and Star and Griesemer's boundary objects, to reveal such shifts during the U.S. Census Bureau's adoption of differential privacy (DP) in its updated disclosure avoidance system (DAS) for the 2020 census. This update preserved (and arguably strengthened) the confidentiality protections that the Bureau is mandated to uphold, and the Bureau engaged in a range of activities to facilitate public understanding of and participation in the system design process. Using publicly available documents concerning the Census' implementation of DP, this case study seeks to expand our understanding of how technical shifts implicate values, how such shifts can afford (or fail to afford) greater transparency and participation in system design, and the importance of localized expertise throughout. We present three lessons from this case study toward grounding understandings of algorithmic transparency and participation: (1) efforts towards transparency and participation in algorithmic governance must center values and policy decisions, not just technical design decisions; (2) the handoff model is a useful tool for revealing how such values may be cloaked beneath technical decisions; and (3) boundary objects alone cannot bridge distant communities without trusted experts traveling alongside to broker their adoption.

5/30/2024

cs.CY

Position: Insights from Survey Methodology can Improve Training Data

Stephanie Eckman, Barbara Plank, Frauke Kreuter

Whether future AI models are fair, trustworthy, and aligned with the public's interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performing models, making this the right moment to introduce AI/ML researchers to the field of survey methodology, the science of data collection. We summarize insights from the survey methodology literature and discuss how they can improve the quality of training and feedback data. We also suggest collaborative research ideas into how biases in data collection can be mitigated, making models more accurate and human-centric.

6/11/2024

cs.HC

Explainability for Transparent Conversational Information-Seeking

Weronika {L}ajewska, Damiano Spina, Johanne Trippas, Krisztian Balog

The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining the responses, hypothesizing that transparency about the source of the information, system confidence, and limitations can enhance users' ability to objectively assess the response. By exploring transparency across explanation type, quality, and presentation mode, this research aims to bridge the gap between system-generated responses and responses verifiable by the user. We design a user study to answer questions concerning the impact of (1) the quality of explanations enhancing the response on its usefulness and (2) ways of presenting explanations to users. The analysis of the collected data reveals lower user ratings for noisy explanations, although these scores seem insensitive to the quality of the response. Inconclusive results on the explanations presentation format suggest that it may not be a critical factor in this setting.

5/7/2024

cs.IR cs.HC