The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels

2405.05860

Published 5/10/2024 by Eve Fleisig, Su Lin Blodgett, Dan Klein, Zeerak Talat

🌀

Abstract

Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement--some challenged by perspectivist approaches, and some that remain to be addressed--as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.

Create account to get full access

Overview

The paper examines the assumptions and challenges of capturing human labels, exploring the shift from the longstanding paradigm to a more perspectivist approach.
It highlights issues with the traditional approach, such as the reliance on a single ground truth and the inability to account for individual differences in perception and interpretation.
The paper proposes a move towards a perspectivist paradigm that embraces the diversity of human perspectives and explores ways to model and leverage this diversity.

Plain English Explanation

The research paper discusses a shift in how we think about and work with human-provided labels or annotations. Traditionally, the assumption has been that there is a single, objective "ground truth" that can be captured by having multiple people label the same data. However, this approach fails to account for the fact that people often have different perspectives, experiences, and biases that shape how they interpret and label the world around them.

The paper argues for a move towards a "perspectivist" paradigm, which acknowledges that there may not be a single correct answer, but rather a diversity of valid viewpoints. This shift requires rethinking how we design labeling tasks, collect and model human labels, and leverage the insights that can be gained from understanding the different ways people perceive and categorize information.

By embracing the complexity of human perception and interpretation, the researchers believe we can develop more nuanced and accurate machine learning models that better reflect the richness of human experience. This could have important implications for a wide range of applications, from computer vision to natural language processing, where human-provided labels are often a critical component.

Technical Explanation

The paper begins by outlining the "longstanding paradigm" in which human labels are treated as a means of capturing a single, objective "ground truth" about the world. This approach relies on the assumption that with enough data and diligent annotation efforts, we can converge on a consensus view that accurately reflects reality.

However, the authors argue that this paradigm fails to account for the inherent subjectivity and diversity of human perception and interpretation. They highlight research on individual alignment, annotator biases, and scaling challenges that undermines the idea of a single ground truth.

In contrast, the "perspectivist paradigm" embraces the fact that people's labels may reflect their personal experiences and opinions, as well as potential issues like conformity or impersonation. The goal is to model and leverage this diversity, rather than attempting to filter it out.

The paper outlines several key challenges in transitioning to this new paradigm, including the need for more sophisticated annotation frameworks, better understanding of the factors shaping human labels, and techniques for extracting meaningful insights from the resulting diversity of perspectives.

Critical Analysis

The paper makes a compelling case for the limitations of the longstanding paradigm and the need to embrace a more perspectivist approach. However, it also acknowledges the significant technical and conceptual challenges in making this shift.

One potential concern is the risk of relativism, where the diversity of perspectives is taken to the extreme and no single view is considered more valid than another. The authors do not fully address how to balance the recognition of multiple valid perspectives with the need to make practical decisions and draw actionable conclusions from the data.

Additionally, the paper focuses primarily on the challenges of capturing and modeling human labels, but does not delve deeply into the downstream implications for building effective machine learning systems. Further research may be needed to understand how a perspectivist approach can be effectively integrated into the broader ML pipeline.

Despite these potential limitations, the paper raises important questions about the assumptions underlying current practices in data annotation and model development. Encouraging researchers and practitioners to think more critically about the role of human perspective in this process is a valuable contribution that could have far-reaching impacts on the field.

Conclusion

The research paper presents a compelling case for a paradigm shift in how we think about and work with human-provided labels or annotations. By moving away from the longstanding assumption of a single, objective "ground truth" and towards a more perspectivist approach, the authors argue that we can better capture the diversity of human perception and interpretation.

This shift has the potential to lead to more nuanced and accurate machine learning models, as well as a deeper understanding of the factors that shape human decision-making and categorization. While the challenges involved in implementing this new paradigm are significant, the paper's insights can serve as a valuable starting point for further research and innovation in this critical area of machine learning and AI development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🌀

Noise Correction on Subjective Datasets

Uthman Jinadu, Yi Ding

Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.

6/5/2024

cs.LG cs.AI cs.HC

Designing NLP Systems That Adapt to Diverse Worldviews

Claudiu Creanga, Liviu P. Dinu

Natural Language Inference (NLI) is foundational for evaluating language understanding in AI. However, progress has plateaued, with models failing on ambiguous examples and exhibiting poor generalization. We argue that this stems from disregarding the subjective nature of meaning, which is intrinsically tied to an individual's textit{weltanschauung} (which roughly translates to worldview). Existing NLP datasets often obscure this by aggregating labels or filtering out disagreement. We propose a perspectivist approach: building datasets that capture annotator demographics, values, and justifications for their labels. Such datasets would explicitly model diverse worldviews. Our initial experiments with a subset of the SBIC dataset demonstrate that even limited annotator metadata can improve model performance.

5/21/2024

cs.CL

🌀

GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives

Vinodkumar Prabhakaran, Christopher Homan, Lora Aroyo, Aida Mostafazadeh Davani, Alicia Parrish, Alex Taylor, Mark D'iaz, Ding Wang, Gregory Serapio-Garc'ia

Human annotation plays a core role in machine learning -- annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement learning, to cite a few avenues. However, the fact that many of these human annotations are inherently subjective is often overlooked. Recent work has demonstrated that ignoring rater subjectivity (typically resulting in rater disagreement) is problematic within specific tasks and for specific subgroups. Generalizable methods to harness rater disagreement and thus understand the socio-cultural leanings of subjective tasks remain elusive. In this paper, we propose GRASP, a comprehensive disagreement analysis framework to measure group association in perspectives among different rater sub-groups, and demonstrate its utility in assessing the extent of systematic disagreements in two datasets: (1) safety annotations of human-chatbot conversations, and (2) offensiveness annotations of social media posts, both annotated by diverse rater pools across different socio-demographic axes. Our framework (based on disagreement metrics) reveals specific rater groups that have significantly different perspectives than others on certain tasks, and helps identify demographic axes that are crucial to consider in specific task contexts.

6/17/2024

cs.CL cs.AI

POV Learning: Individual Alignment of Multimodal Models using Human Perception

Simon Werner, Katharina Christ, Laura Bernardy, Marion G. Muller, Achim Rettinger

Aligning machine learning systems with human expectations is mostly attempted by training with manually vetted human behavioral samples, typically explicit feedback. This is done on a population level since the context that is capturing the subjective Point-Of-View (POV) of a concrete person in a specific situational context is not retained in the data. However, we argue that alignment on an individual level can boost the subjective predictive performance for the individual user interacting with the system considerably. Since perception differs for each person, the same situation is observed differently. Consequently, the basis for decision making and the subsequent reasoning processes and observable reactions differ. We hypothesize that individual perception patterns can be used for improving the alignment on an individual level. We test this, by integrating perception information into machine learning systems and measuring their predictive performance wrt.~individual subjective assessments. For our empirical study, we collect a novel data set of multimodal stimuli and corresponding eye tracking sequences for the novel task of Perception-Guided Crossmodal Entailment and tackle it with our Perception-Guided Multimodal Transformer. Our findings suggest that exploiting individual perception signals for the machine learning of subjective human assessments provides a valuable cue for individual alignment. It does not only improve the overall predictive performance from the point-of-view of the individual user but might also contribute to steering AI systems towards every person's individual expectations and values.

5/8/2024

cs.AI