Google Topics as a way out of the cookie dilemma?

Read original: arXiv:2407.03846 - Published 7/8/2024 by Marius Koppel (n'e Stroscher), Jan-Philipp Muttach (n'e Stroscher), Gerrit Hornung
Total Score

0

Google Topics as a way out of the cookie dilemma?

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Google Topics is a proposed alternative to third-party cookies for targeted advertising.
  • It aims to address privacy concerns with previous approaches like Federated Learning of Cohorts (FLoC).
  • The system assigns users to broad interest "topics" to enable ad targeting without revealing detailed browsing history.

Plain English Explanation

Google Topics is a new system that Google has proposed as a way to do targeted advertising without using the kind of detailed user tracking that third-party cookies enable.

Under the Google Topics system, your browser would assign you to a set of broad interest "topics" based on your browsing history. Advertisers could then show you ads based on those general topics, without having access to the specifics of which websites you've visited.

This is intended to address the privacy concerns that have arisen around the use of third-party cookies, which allow detailed tracking of individual users' browsing behavior. By only sharing broad topic information, Google Topics aims to enable targeted advertising while better protecting user privacy.

Technical Explanation

The Google Topics system works by having your web browser analyze your browsing history and assign you to a set of interest "topics." There are a total of around 350 possible topics, covering broad areas like "Fitness," "Travel," or "Technology."

Your browser would select a small number of these topics (e.g. 3-5) that best represent your interests based on your recent web activity. This topic information would then be shared with websites and advertisers when you visit them, allowing for ad targeting without the need for detailed user profiles.

The key innovation of Google Topics is that it performs this topic assignment entirely within the user's browser, without sending browsing data to any central servers. This helps address privacy concerns compared to previous approaches like Federated Learning of Cohorts (FLoC), which had been criticized for potentially revealing too much information about individual users.

Critical Analysis

While Google Topics represents an improvement over third-party cookies in terms of privacy, some potential concerns remain. For example, the system could still allow for a degree of user re-identification if advertisers are able to infer too much about an individual's interests from their topic profile.

Additionally, the use of broad interest categories may be less effective for ad targeting than the more granular data provided by cookies. This could potentially reduce the value of the system for advertisers and websites that rely on highly targeted ads for revenue.

Further research and real-world testing would be needed to fully assess the efficacy and privacy implications of the Google Topics approach compared to alternative solutions. Ongoing feedback and scrutiny from privacy advocates, policymakers, and other stakeholders will also be important as this technology continues to evolve.

Conclusion

Google Topics represents an attempt to find a middle ground between the targeted advertising capabilities of third-party cookies and the heightened privacy protections that users and regulators are demanding. By limiting the sharing of user data to broad interest categories, the system aims to enable personalized ads while better safeguarding individual privacy.

While not a perfect solution, Google Topics appears to be a step in the right direction and could serve as a useful framework for further developing privacy-preserving advertising technologies. As the digital advertising landscape continues to evolve, approaches like this that balance user privacy and business needs will likely become increasingly important.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Google Topics as a way out of the cookie dilemma?
Total Score

0

Google Topics as a way out of the cookie dilemma?

Marius Koppel (n'e Stroscher), Jan-Philipp Muttach (n'e Stroscher), Gerrit Hornung

The paper discusses the legal requirements and implications of the processing of information and personal data for advertising purposes, particularly in the light of the Planet49 decision of the European Court of Justice (ECJ) and the Cookie Consent II decision by the German Federal Court (Bundesgerichtshof, BGH). It emphasises that obtaining explicit consent of individuals is necessary for setting cookies. The introduction of the German Telecommunication Telemedia Data Protection Act (Telekommunikation-Telemedien-Datenschutzgesetz, TTDSG) has replaced the relevant section of the German Telemedia Act (Telemediengesetz, TMG) and transpose the concept of informed consent for storing and accessing information on terminal equipment, aligning with Article 5(3) ePrivacy Directive. To meet these requirements, companies exploring alternatives to obtaining consent are developing technical mechanisms that rely on a legal basis. Google tested initially Federated Learning of Cohorts (FLoC) as part of their Privacy Sandbox strategy. This technology was significantly criticized, Google introduced a new project called Google Topics, which aims to personalize advertising by categorizing users into interest groups, called topics. Implementation of this technology began in July 2023.

Read more

7/8/2024

📊

Total Score

0

Me want cookie! Towards automated and transparent data governance on the Web

Jesse Wright, Beatriz Esteves, Rui Zhao

This paper presents a sociotechnical vision for managing personal data, including cookies, within Web browsers. We first present our vision for a future of semi-automated data governance on the Web, using policy languages to describe data terms of use, and having browsers act on behalf of users to enact policy-based controls. Then, we present an overview of the technical research required to {prove} that existing policy languages express a sufficient range of concepts for describing cookie policies on the Web today. We view this work as a stepping stone towards a future of semi-automated data governance at Web-scale, which in the long term will also be used by next-generation Web technologies such as Web agents and Solid.

Read more

8/20/2024

⚙️

Total Score

0

Nudging Consent and the New Opt Out System to the Processing of Health Data in England

Janos Meszaros, Chih-hsing Ho, Marcelo Corrales Compagnucci

This chapter examines the challenges of the revised opt out system and the secondary use of health data in England. The analysis of this data could be very valuable for science and medical treatment as well as for the discovery of new drugs. For this reason, the UK government established the care.data program in 2013. The aim of the project was to build a central nationwide database for research and policy planning. However, the processing of personal data was planned without proper public engagement. Research has suggested that IT companies, such as in the Google DeepMind deal case, had access to other kinds of sensitive data and failed to comply with data protection law. Since May 2018, the government has launched the national data opt out system with the hope of regaining public trust. Nevertheless, there are no evidence of significant changes in the ND opt out, compared to the previous opt out system. Neither in the use of secondary data, nor in the choices that patients can make. The only notorious difference seems to be in the way that these options are communicated and framed to the patients. Most importantly, according to the new ND opt out, the type 1 opt out option, which is the only choice that truly stops data from being shared outside direct care, will be removed in 2020. According to the Behavioral Law and Economics literature (Nudge Theory), default rules, such as the revised opt out system in England, are very powerful, because people tend to stick to the default choices made readily available to them. The crucial question analyzed in this chapter is whether it is desirable for the UK government to stop promoting the type 1 opt outs, and whether this could be seen as a kind of hard paternalism.

Read more

7/30/2024

🤖

Total Score

0

Consent in Crisis: The Rapid Decline of the AI Data Commons

Shayne Longpre, Robert Mahari, Ariel Lee, Campbell Lund, Hamidah Oderinwale, William Brannon, Nayan Saxena, Naana Obeng-Marnu, Tobin South, Cole Hunter, Kevin Klyman, Christopher Klamm, Hailey Schoelkopf, Nikhil Singh, Manuel Cherep, Ahmad Anis, An Dinh, Caroline Chitongo, Da Yin, Damien Sileo, Deividas Mataciunas, Diganta Misra, Emad Alghamdi, Enrico Shippole, Jianguo Zhang, Joanna Materzynska, Kun Qian, Kush Tiwary, Lester Miranda, Manan Dey, Minnie Liang, Mohammed Hamdy, Niklas Muennighoff, Seonghyeon Ye, Seungone Kim, Shrestha Mohanty, Vipul Gupta, Vivek Sharma, Vu Minh Chien, Xuhui Zhou, Yizhi Li, Caiming Xiong, Luis Villa, Stella Biderman, Hanlin Li, Daphne Ippolito, Sara Hooker, Jad Kabbara, Sandy Pentland

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14,000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AI-specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.

Read more

7/25/2024