Data Privacy Vocabulary (DPV) -- Version 2






Published 4/23/2024 by Harshvardhan J. Pandit, Beatriz Esteves, Georg P. Krog, Paul Ryan, Delaram Golpayegani, Julian Flake
Data Privacy Vocabulary (DPV) -- Version 2


The Data Privacy Vocabulary (DPV), developed by the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG), enables the creation of machine-readable, interoperable, and standards-based representations for describing the processing of personal data. The group has also published extensions to the DPV to describe specific applications to support legislative requirements such as the EU's GDPR. The DPV fills a crucial niche in the state of the art by providing a vocabulary that can be embedded and used alongside other existing standards such as W3C ODRL, and which can be customised and extended for adapting to specifics of use-cases or domains. This article describes the version 2 iteration of the DPV in terms of its contents, methodology, current adoptions and uses, and future potential. It also describes the relevance and role of DPV in acting as a common vocabulary to support various regulatory (e.g. EU's DGA and AI Act) and community initiatives (e.g. Solid) emerging across the globe.

Create account to get full access


If you already have an account, we'll log you in


  • The paper "Data Privacy Vocabulary (DPV) - Version 2" presents a comprehensive vocabulary for data privacy concepts, designed to facilitate legal and regulatory compliance.
  • The vocabulary aims to standardize terminology and enable interoperability across different privacy frameworks and regulations.
  • The paper outlines the key requirements for developing a legal vocabulary and describes the design and structure of the DPV.

Plain English Explanation

The paper introduces the Data Privacy Vocabulary (DPV) - Version 2, which is a collection of standardized terms and definitions related to data privacy. This vocabulary is intended to help organizations and individuals better understand and comply with various data privacy laws and regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).

The key idea is to create a common language and framework for discussing data privacy, which can facilitate communication and collaboration between different stakeholders, such as policymakers, legal experts, and technology developers. By using the DPV, organizations can more easily identify, classify, and manage their data privacy-related assets and activities, ultimately helping them to achieve compliance with applicable laws and regulations.

The paper outlines the specific requirements for developing a legal vocabulary, such as the need for precise definitions, hierarchical organization, and support for different languages and jurisdictions. It then describes the structure and content of the DPV, including the various classes, properties, and relationships that make up the vocabulary.

Technical Explanation

The Data Privacy Vocabulary (DPV) - Version 2 is an ontology-based vocabulary that provides a standardized way to represent and communicate data privacy concepts. The vocabulary is designed to be comprehensive, covering a wide range of topics related to data privacy, such as personal data, processing activities, lawful bases, and individual rights.

The DPV is structured as a hierarchical taxonomy, with classes and subclasses representing different concepts and entities within the data privacy domain. Each class and property is defined using clear, unambiguous language, with references to relevant legal frameworks and regulations. The vocabulary also supports multilingual representation, allowing for translations of the terms and definitions into different languages.

The development of the DPV follows a set of well-defined requirements for legal vocabularies, such as:

  1. Information and Knowledge Modelling: The vocabulary must accurately represent the concepts and relationships within the data privacy domain, and support reasoning and inference capabilities.
  2. Compliance and Interoperability: The vocabulary must align with existing legal frameworks and enable interoperability between different privacy-related systems and applications.
  3. Usability and Extensibility: The vocabulary must be easy to understand and use, and provide a flexible and extensible structure to accommodate future developments in data privacy regulations and best practices.

The DPV is implemented using Semantic Web technologies, such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). This allows for the creation of machine-readable representations of the vocabulary, which can be easily integrated into various software systems and applications.

Critical Analysis

The Data Privacy Vocabulary (DPV) - Version 2 represents a significant step towards standardizing data privacy terminology and facilitating compliance with relevant laws and regulations. By providing a common language and framework for discussing data privacy concepts, the DPV can help organizations better understand and manage their data privacy-related assets and activities.

However, the paper also acknowledges several limitations and areas for further research. For instance, the authors note that the DPV is not intended to be a comprehensive solution for all data privacy-related challenges, and that it may need to be extended or adapted to address specific regulatory or industry-specific requirements.

Additionally, the paper does not explore the potential challenges of adoption and implementation of the DPV, such as the need for buy-in from different stakeholders, the integration with existing systems and processes, and the ongoing maintenance and updates required to keep the vocabulary up-to-date with evolving privacy regulations and best practices.


The Data Privacy Vocabulary (DPV) - Version 2 represents an important contribution to the field of data privacy, providing a standardized and comprehensive vocabulary that can help organizations and individuals better understand and comply with relevant laws and regulations. By enabling more effective communication and interoperability across different privacy frameworks, the DPV has the potential to significantly improve data privacy practices and contribute to a more transparent and trustworthy digital landscape.

However, the successful adoption and implementation of the DPV will likely require ongoing collaboration and coordination between various stakeholders, as well as a commitment to maintaining and evolving the vocabulary over time to keep pace with the rapidly changing data privacy landscape.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Privacy Requirements and Realities of Digital Public Goods

Privacy Requirements and Realities of Digital Public Goods

Geetika Gopi, Aadyaa Maddi, Omkhar Arasaratnam, Giulia Fanti





In the international development community, the term digital public goods is used to describe open-source digital products (e.g., software, datasets) that aim to address the United Nations (UN) Sustainable Development Goals. DPGs are increasingly being used to deliver government services around the world (e.g., ID management, healthcare registration). Because DPGs may handle sensitive data, the UN has established user privacy as a first-order requirement for DPGs. The privacy risks of DPGs are currently managed in part by the DPG standard, which includes a prerequisite questionnaire with questions designed to evaluate a DPG's privacy posture. This study examines the effectiveness of the current DPG standard for ensuring adequate privacy protections. We present a systematic assessment of responses from DPGs regarding their protections of users' privacy. We also present in-depth case studies from three widely-used DPGs to identify privacy threats and compare this to their responses to the DPG standard. Our findings reveal limitations in the current DPG standard's evaluation approach. We conclude by presenting preliminary recommendations and suggestions for strengthening the DPG standard as it relates to privacy. Additionally, we hope this study encourages more usable privacy research on communicating privacy, not only to end users but also third-party adopters of user-facing technologies.

Read more


ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries

ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries

Rachel Cummings, Shlomi Hod, Jayshree Sarathy, Marika Swanberg





Differential Privacy (DP) is a mathematical framework that is increasingly deployed to mitigate privacy risks associated with machine learning and statistical analyses. Despite the growing adoption of DP, its technical privacy parameters do not lend themselves to an intelligible description of the real-world privacy risks associated with that deployment: the guarantee that most naturally follows from the DP definition is protection against membership inference by an adversary who knows all but one data record and has unlimited auxiliary knowledge. In many settings, this adversary is far too strong to inform how to set real-world privacy parameters. One approach for contextualizing privacy parameters is via defining and measuring the success of technical attacks, but doing so requires a systematic categorization of the relevant attack space. In this work, we offer a detailed taxonomy of attacks, showing the various dimensions of attacks and highlighting that many real-world settings have been understudied. Our taxonomy provides a roadmap for analyzing real-world deployments and developing theoretical bounds for more informative privacy attacks. We operationalize our taxonomy by using it to analyze a real-world case study, the Israeli Ministry of Health's recent release of a birth dataset using DP, showing how the taxonomy enables fine-grained threat modeling and provides insight towards making informed privacy parameter choices. Finally, we leverage the taxonomy towards defining a more realistic attack than previously considered in the literature, namely a distributional reconstruction attack: we generalize Balle et al.'s notion of reconstruction robustness to a less-informed adversary with distributional uncertainty, and extend the worst-case guarantees of DP to this average-case setting.

Read more


Automating the Identification of High-Value Datasets in Open Government Data Portals

Automating the Identification of High-Value Datasets in Open Government Data Portals

Alfonso Quarati, Anastasija Nikiforova





Recognized for fostering innovation and transparency, driving economic growth, enhancing public services, supporting research, empowering citizens, and promoting environmental sustainability, High-Value Datasets (HVD) play a crucial role in the broader Open Government Data (OGD) movement. However, identifying HVD presents a resource-intensive and complex challenge due to the nuanced nature of data value. Our proposal aims to automate the identification of HVDs on OGD portals using a quantitative approach based on a detailed analysis of user interest derived from data usage statistics, thereby minimizing the need for human intervention. The proposed method involves extracting download data, analyzing metrics to identify high-value categories, and comparing HVD datasets across different portals. This automated process provides valuable insights into trends in dataset usage, reflecting citizens' needs and preferences. The effectiveness of our approach is demonstrated through its application to a sample of US OGD city portals. The practical implications of this study include contributing to the understanding of HVD at both local and national levels. By providing a systematic and efficient means of identifying HVD, our approach aims to inform open governance initiatives and practices, aiding OGD portal managers and public authorities in their efforts to optimize data dissemination and utilization.

Read more



PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification

Leon Garza, Lavanya Elluri, Anantaa Kotal, Aritran Piplai, Deepti Gupta, Anupam Joshi





Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with the standards expected by regulatory bodies. Businesses are required, often under the penalty of law, to ensure compliance with the evolving regulatory rules. Interpreting and implementing these regulations pose challenges due to their complexity. Regulatory documents are extensive, demanding significant effort for interpretation, while vendor-drafted privacy policies often lack the detail required for full legal compliance, leading to ambiguity. To ensure a concise interpretation of the regulatory requirements and compliance of organizational privacy policy with said regulations, we propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance. In this paper, we develop the novel Privacy Policy Compliance Verification Knowledge Graph, PrivComp-KG. It is designed to efficiently store and retrieve comprehensive information concerning privacy policies, regulatory frameworks, and domain-specific knowledge pertaining to the legal landscape of privacy. Using Retrieval Augmented Generation, we identify the relevant sections in a privacy policy with corresponding regulatory rules. This information about individual privacy policies is populated into the PrivComp-KG. Combining this with the domain context and rules, the PrivComp-KG can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations. We demonstrate the relevance of the PrivComp-KG, by verifying compliance of privacy policy documents for various organizations.

Read more
