Agreeing and Disagreeing in Collaborative Knowledge Graph Construction: An Analysis of Wikidata

Read original: arXiv:2306.11766 - Published 7/24/2024 by Elisavet Koutsiana, Tushita Yadav, Nitisha Jain, Albert Mero~no-Pe~nuela, Elena Simperl

Agreeing and Disagreeing in Collaborative Knowledge Graph Construction: An Analysis of Wikidata

Overview

Examines how users of the collaborative knowledge graph Wikidata agree and disagree during the construction process
Analyzes discussion pages to understand patterns of consensus and controversy
Provides insights into the dynamics of knowledge collaboration and the challenges of building comprehensive knowledge bases

Plain English Explanation

This paper looks at how people work together to build the Wikidata knowledge graph, which is a collaborative effort to create a comprehensive database of facts and information. The researchers analyzed the discussion pages where Wikidata users debate and discuss the content, in order to understand how they reach agreement or disagree with each other.

The key ideas are to explore the dynamics of this type of collaborative knowledge construction, and to identify the challenges involved in building a complete and accurate knowledge base when many people are contributing. By studying the patterns of agreement and disagreement, the researchers hope to gain insights that can help improve the process of collaborative knowledge graph building.

Technical Explanation

The paper analyzes discussion pages from Wikidata, which is a collaborative knowledge graph that allows users to add, edit, and debate the content. The researchers looked at the textual discussions to identify patterns of consensus and controversy as users work together to construct the knowledge base.

The analysis focused on several aspects:

Edit activities: Examining how users make edits and revisions to the knowledge graph content.
Discussions: Analyzing the textual exchanges where users agree, disagree, and negotiate the content.
User roles: Identifying different types of users based on their level of activity and influence.

By studying these elements, the paper provides insights into the social and technical challenges of collaborative knowledge graph construction. The findings can inform the design of tools and processes to better support this type of distributed, community-driven knowledge building effort.

Critical Analysis

The paper provides a valuable empirical analysis of the collaborative dynamics within Wikidata. However, a few limitations and areas for further research are worth noting:

Scope: The analysis is focused solely on Wikidata, so the findings may not fully generalize to other collaborative knowledge graph projects with different community structures and norms.
Bias: The researchers acknowledge that their analysis of discussion pages may be biased towards more active and vocal users, potentially missing perspectives from less engaged contributors.
Causality: While the paper identifies patterns of agreement and disagreement, it does not fully explain the underlying reasons and motivations driving these dynamics.

Further research could explore comparative studies across diverse collaborative knowledge graph initiatives, as well as deeper qualitative investigations into the individual and community factors shaping the knowledge construction process.

Conclusion

This paper provides valuable insights into the collaborative nature of knowledge graph building, using Wikidata as a case study. The analysis of user discussions reveals the complex social dynamics involved as people work together to create a comprehensive and accurate knowledge base.

The findings highlight the challenges of achieving consensus in a distributed, community-driven effort, as well as the importance of understanding different user roles and motivations. These insights can inform the design of tools and processes to better support collaborative knowledge graph construction, ultimately enabling the creation of more robust and reliable knowledge resources.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Agreeing and Disagreeing in Collaborative Knowledge Graph Construction: An Analysis of Wikidata

Elisavet Koutsiana, Tushita Yadav, Nitisha Jain, Albert Mero~no-Pe~nuela, Elena Simperl

In this work, we study disagreement in discussions around Wikidata, an online knowledge community that builds the data backend of Wikipedia. Discussions are important in collaborative work as they can increase contributor performance and encourage the emergence of shared norms and practices. While disagreements can play a productive role in discussions, they can also lead to conflicts and controversies, which impact contributor well-being and their motivation to engage. We want to understand if and when such phenomena arise in Wikidata, using a mix of quantitative and qualitative analyses to identify the types of topics people disagree about, the most common patterns of interaction, and roles people play when arguing for or against an issue. We find that decisions to create Wikidata properties are much faster than those to delete properties and that more than half of controversial discussions do not lead to consensus. Our analysis suggests that Wikidata is an inclusive community, considering different opinions when making decisions, and that conflict and vandalism are rare in discussions. At the same time, while one-fourth of the editors participating in controversial discussions contribute with legit and insightful opinions about Wikidata's emerging issues, they do not remain engaged in the discussions. We hope our findings will help Wikidata support community decision making, and improve discussion tools and practices.

7/24/2024

Talking Wikidata: Communication patterns and their impact on community engagement in collaborative knowledge graphs

Elisavet Koutsiana, Ioannis Reklos, Kholoud Saad Alghamdi, Nitisha Jain, Albert Mero~no-Pe~nuela, Elena Simperl

We study collaboration patterns of Wikidata, one of the world's largest collaborative knowledge graph communities. Wikidata lacks long-term engagement with a small group of priceless members, 0.8%, to be responsible for 80% of contributions. Therefore, it is essential to investigate their behavioural patterns and find ways to enhance their contributions and participation. Previous studies have highlighted the importance of discussions among contributors in understanding these patterns. To investigate this, we analyzed all the discussions on Wikidata and used a mixed methods approach, including statistical tests, network analysis, and text and graph embedding representations. Our research showed that the interactions between Wikidata editors form a small world network where the content of a post influences the continuity of conversations. We also found that the account age of Wikidata members and their conversations are significant factors in their long-term engagement with the project. Our findings can benefit the Wikidata community by helping them improve their practices to increase contributions and enhance long-term participation.

7/29/2024

➖

Conceptual Mapping of Controversies

Claude Draude, Dominik Durrschnabel, Johannes Hirth, Viktoria Horn, Jonathan Kropf, Jorn Lamla, Gerd Stumme, Markus Uhlmann

With our work, we contribute towards a qualitative analysis of the discourse on controversies in online news media. For this, we employ Formal Concept Analysis and the economics of conventions to derive conceptual controversy maps. In our experiments, we analyze two maps from different news journals with methods from ordinal data science. We show how these methods can be used to assess the diversity, complexity and potential bias of controversies. In addition to that, we discuss how the diagrams of concept lattices can be used to navigate between news articles.

5/1/2024

WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia

Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano, Tigran Tchrakian, Radu Marinescu, Elizabeth Daly, Inkit Padhi, Prasanna Sattigeri

Retrieval-augmented generation (RAG) has emerged as a promising solution to mitigate the limitations of large language models (LLMs), such as hallucinations and outdated information. However, it remains unclear how LLMs handle knowledge conflicts arising from different augmented retrieved passages, especially when these passages originate from the same source and have equal trustworthiness. In this work, we conduct a comprehensive evaluation of LLM-generated answers to questions that have varying answers based on contradictory passages from Wikipedia, a dataset widely regarded as a high-quality pre-training resource for most LLMs. Specifically, we introduce WikiContradict, a benchmark consisting of 253 high-quality, human-annotated instances designed to assess LLM performance when augmented with retrieved passages containing real-world knowledge conflicts. We benchmark a diverse range of both closed and open-source LLMs under different QA scenarios, including RAG with a single passage, and RAG with 2 contradictory passages. Through rigorous human evaluations on a subset of WikiContradict instances involving 5 LLMs and over 3,500 judgements, we shed light on the behaviour and limitations of these models. For instance, when provided with two passages containing contradictory facts, all models struggle to generate answers that accurately reflect the conflicting nature of the context, especially for implicit conflicts requiring reasoning. Since human evaluation is costly, we also introduce an automated model that estimates LLM performance using a strong open-source language model, achieving an F-score of 0.8. Using this automated metric, we evaluate more than 1,500 answers from seven LLMs across all WikiContradict instances. To facilitate future work, we release WikiContradict on: https://ibm.biz/wikicontradict.

6/21/2024