Me want cookie! Towards automated and transparent data governance on the Web

Read original: arXiv:2408.09071 - Published 8/20/2024 by Jesse Wright, Beatriz Esteves, Rui Zhao
Total Score

0

📊

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Explores the challenge of automated and transparent data governance on the web
  • Proposes a framework for decentralized, user-controlled data management
  • Aims to address issues like consent, privacy, and data control

Plain English Explanation

The paper discusses the challenges of managing and governing personal data on the web. As more of our lives move online, the control and ownership of our digital information has become a major concern. The authors propose a new framework for decentralized data governance that puts users in charge of their own data.

The key idea is to give individuals more transparency and control over how their personal information is collected, used, and shared online. Instead of handing over data to large platforms and tech companies, the framework would allow people to manage their digital footprint through a decentralized system.

This could help address issues like the consent crisis in AI, where users often don't understand how their data is being used. By automating data governance, the system aims to make the process more transparent and give people more say over their digital privacy.

Technical Explanation

The paper outlines a conceptual framework for automated and decentralized data governance on the web. At the core is a user-centric model where individuals have granular control over their personal data through smart contracts and blockchain technology.

The system would allow users to set preferences and policies for how their information can be accessed and used. These rules would be encoded into self-executing smart contracts that automatically enforce data usage agreements. This would create a more transparent system of web content control compared to the current model dominated by large tech platforms.

Key components of the framework include:

  • User profiles: Decentralized, user-owned digital profiles that store personal data and usage preferences
  • Smart contracts: Self-executing agreements that codify data access rules and automate enforcement
  • Blockchain: A distributed ledger that records data transactions and ensures transparency

By leveraging these technologies, the goal is to shift power away from centralized data brokers and give individuals more agency over their digital identity and information.

Critical Analysis

The paper presents a compelling vision for a more user-centric and transparent approach to data governance on the web. Empowering individuals to control their own information is an important step towards addressing systemic privacy and consent issues online.

However, the proposal also raises some practical challenges. Implementing a truly decentralized system at scale would require significant technological and infrastructure changes to the current web ecosystem. There are also open questions around how to handle complex data usage agreements, dispute resolution, and potential misuse of the system.

Additionally, the reliance on emerging technologies like blockchain and smart contracts means the framework may face adoption hurdles until these tools mature and become more accessible. Ensuring the security and reliability of the underlying infrastructure would also be critical.

Overall, the paper offers a thoughtful starting point for rethinking data governance, but significant work remains to translate the conceptual model into a deployable, real-world system.

Conclusion

This paper outlines an innovative framework for automating and decentralizing data governance on the web. By giving users more transparency and control over their personal information, the proposed system aims to address longstanding issues around privacy, consent, and the power imbalance between individuals and large tech platforms.

While the vision faces various technical and practical challenges, the core ideas presented could have significant implications for the future of the internet and how we manage our digital lives. As concerns around data rights and online privacy continue to grow, solutions like this may become increasingly important for empowering individuals and restoring trust in the web.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Total Score

0

Me want cookie! Towards automated and transparent data governance on the Web

Jesse Wright, Beatriz Esteves, Rui Zhao

This paper presents a sociotechnical vision for managing personal data, including cookies, within Web browsers. We first present our vision for a future of semi-automated data governance on the Web, using policy languages to describe data terms of use, and having browsers act on behalf of users to enact policy-based controls. Then, we present an overview of the technical research required to {prove} that existing policy languages express a sufficient range of concepts for describing cookie policies on the Web today. We view this work as a stepping stone towards a future of semi-automated data governance at Web-scale, which in the long term will also be used by next-generation Web technologies such as Web agents and Solid.

Read more

8/20/2024

Google Topics as a way out of the cookie dilemma?
Total Score

0

Google Topics as a way out of the cookie dilemma?

Marius Koppel (n'e Stroscher), Jan-Philipp Muttach (n'e Stroscher), Gerrit Hornung

The paper discusses the legal requirements and implications of the processing of information and personal data for advertising purposes, particularly in the light of the Planet49 decision of the European Court of Justice (ECJ) and the Cookie Consent II decision by the German Federal Court (Bundesgerichtshof, BGH). It emphasises that obtaining explicit consent of individuals is necessary for setting cookies. The introduction of the German Telecommunication Telemedia Data Protection Act (Telekommunikation-Telemedien-Datenschutzgesetz, TTDSG) has replaced the relevant section of the German Telemedia Act (Telemediengesetz, TMG) and transpose the concept of informed consent for storing and accessing information on terminal equipment, aligning with Article 5(3) ePrivacy Directive. To meet these requirements, companies exploring alternatives to obtaining consent are developing technical mechanisms that rely on a legal basis. Google tested initially Federated Learning of Cohorts (FLoC) as part of their Privacy Sandbox strategy. This technology was significantly criticized, Google introduced a new project called Google Topics, which aims to personalize advertising by categorizing users into interest groups, called topics. Implementation of this technology began in July 2023.

Read more

7/8/2024

🤖

Total Score

0

Consent in Crisis: The Rapid Decline of the AI Data Commons

Shayne Longpre, Robert Mahari, Ariel Lee, Campbell Lund, Hamidah Oderinwale, William Brannon, Nayan Saxena, Naana Obeng-Marnu, Tobin South, Cole Hunter, Kevin Klyman, Christopher Klamm, Hailey Schoelkopf, Nikhil Singh, Manuel Cherep, Ahmad Anis, An Dinh, Caroline Chitongo, Da Yin, Damien Sileo, Deividas Mataciunas, Diganta Misra, Emad Alghamdi, Enrico Shippole, Jianguo Zhang, Joanna Materzynska, Kun Qian, Kush Tiwary, Lester Miranda, Manan Dey, Minnie Liang, Mohammed Hamdy, Niklas Muennighoff, Seonghyeon Ye, Seungone Kim, Shrestha Mohanty, Vipul Gupta, Vivek Sharma, Vu Minh Chien, Xuhui Zhou, Yizhi Li, Caiming Xiong, Luis Villa, Stella Biderman, Hanlin Li, Daphne Ippolito, Sara Hooker, Jad Kabbara, Sandy Pentland

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14,000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AI-specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.

Read more

7/25/2024

Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs
Total Score

0

Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs

Jesse Wright

This paper presents our research towards a near-term future in which legal entities, such as individuals and organisations can entrust semi-autonomous AI-driven agents to carry out online interactions on their behalf. The author's research concerns the development of semi-autonomous Web agents, which consult users if and only if the system does not have sufficient context or confidence to proceed working autonomously. This creates a user-agent dialogue that allows the user to teach the agent about the information sources they trust, their data-sharing preferences, and their decision-making preferences. Ultimately, this enables the user to maximise control over their data and decisions while retaining the convenience of using agents, including those driven by LLMs. In view of developing near-term solutions, the research seeks to answer the question: How do we build a trustworthy and reliable network of semi-autonomous agents which represent individuals and organisations on the Web?. After identifying key requirements, the paper presents a demo for a sample use case of a generic personal assistant. This is implemented using (Notation3) rules to enforce safety guarantees around belief, data sharing and data usage and LLMs to allow natural language interaction with users and serendipitous dialogues between software agents.

Read more

9/10/2024