Learning to Defer in Content Moderation: The Human-AI Interplay

Read original: arXiv:2402.12237 - Published 6/4/2024 by Thodoris Lykouris, Wentao Weng

📈

Overview

Discusses a model for human-AI collaboration in content moderation on online platforms
Focuses on balancing classification loss, idiosyncratic loss of non-reviewed posts, and delay loss from congestion in the human review system
Proposes a near-optimal learning algorithm to address these challenges

Plain English Explanation

Online platforms often use a combination of AI and human moderators to review content and decide what to keep or remove. Typically, the AI makes an initial assessment of how harmful a post is and uses fixed thresholds to determine whether it should be removed or sent for human review. However, this approach doesn't account for the uncertainty in the AI's predictions, the changing capacity of the human review system, or the fact that humans only review posts that the AI has already filtered.

This paper introduces a new model to capture the interaction between the AI and human moderators. The algorithm observes information about each incoming post, decides whether to classify it as harmful or not, and whether to send it for human review. Only the posts that are admitted get reviewed by humans, and those reviews then help improve the AI's learning. But there can be delays in the human review process due to congestion.

The researchers propose a learning algorithm that tries to balance three key factors: the classification error from the AI's predictions on the limited set of posts it sends for human review, the loss from posts that don't get reviewed, and the delay costs from the congestion in the human review system. This is the first result for online learning in this type of contextual queueing system, so the analytical framework could be useful for other applications.

Technical Explanation

The paper models the human-AI collaboration in content moderation as a learning to defer problem, where the algorithm can choose to defer classification of a post to human review for a fixed cost. However, unlike prior work on online learning with delayed feedback, the delay in the human reviews in this model is endogenous to the algorithm's admission and scheduling decisions.

The algorithm observes contextual information about each incoming post, makes a classification decision, and decides whether to admit the post for human review. Only the admitted posts receive human reviews, which then help improve the machine learning models. But there is congestion in the human review system, leading to delays.

The researchers propose a near-optimal learning algorithm that carefully balances three key objectives:

Minimizing the classification loss from the selectively sampled dataset of reviewed posts
Minimizing the idiosyncratic loss for non-reviewed posts
Minimizing the delay loss from congestion in the human review system

To the best of the authors' knowledge, this is the first result for online learning in contextual queueing systems, where the delay in feedback is endogenous to the algorithm's decisions. This analytical framework could be applicable to other settings involving human-AI collaboration and content moderation.

Critical Analysis

The paper presents a novel and thoughtful approach to modeling the human-AI interplay in content moderation, taking into account key practical considerations like variable human review capacity and delayed feedback. The proposed learning algorithm appears to be a significant advancement over simpler heuristic approaches.

However, the model does make several simplifying assumptions, such as assuming a stationary arrival process for new posts and a fixed delay distribution for human reviews. In reality, post arrival rates and review times may be more dynamic and unpredictable. The paper also does not address the potential for strategic gaming of the system by users trying to get their posts reviewed.

Additionally, the paper focuses on the algorithmic aspects and does not delve deeply into the societal implications and ethical considerations of content moderation systems, such as potential biases in the training data or the tradeoffs between user privacy, free speech, and content removal.

Further research is needed to understand how this type of model would perform in real-world, large-scale deployments and to explore additional factors that may impact the design of effective human-AI collaboration for content moderation.

Conclusion

This paper presents an innovative approach to modeling the human-AI collaboration in online content moderation, introducing a new learning framework that accounts for the challenges of selective sampling, variable human review capacity, and delayed feedback. The proposed algorithm aims to balance classification accuracy, unreviewed post costs, and review delays, representing an important step forward in this domain.

While the model makes some simplifying assumptions, the overall analytical framework could be valuable for other applications involving human-AI collaboration and contextual queueing systems. Further research is needed to understand the real-world implications and ethical considerations of deploying such content moderation systems at scale.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Learning to Defer in Content Moderation: The Human-AI Interplay

Thodoris Lykouris, Wentao Weng

Successful content moderation in online platforms relies on a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of a post and uses fixed thresholds to decide whether to remove it and whether to send it for human review. This disregards the prediction uncertainty, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm). In this paper, we introduce a model to capture the human-AI interplay in content moderation. The algorithm observes contextual information for incoming posts, makes classification and admission decisions, and schedules posts for human review. Only admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. The classical learning-theoretic way to capture this human-AI interplay is via the framework of learning to defer, where the algorithm has the option to defer a classification task to humans for a fixed cost and immediately receive feedback. Our model contributes to this literature by introducing congestion in the human review system. Moreover, unlike work on online learning with delayed feedback where the delay in the feedback is exogenous to the algorithm's decisions, the delay in our model is endogenous to both the admission and the scheduling decisions. We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest.

6/4/2024

A Unifying Post-Processing Framework for Multi-Objective Learn-to-Defer Problems

Mohammad-Amin Charusaie, Samira Samadi

Learn-to-Defer is a paradigm that enables learning algorithms to work not in isolation but as a team with human experts. In this paradigm, we permit the system to defer a subset of its tasks to the expert. Although there are currently systems that follow this paradigm and are designed to optimize the accuracy of the final human-AI team, the general methodology for developing such systems under a set of constraints (e.g., algorithmic fairness, expert intervention budget, defer of anomaly, etc.) remains largely unexplored. In this paper, using a $d$-dimensional generalization to the fundamental lemma of Neyman and Pearson (d-GNP), we obtain the Bayes optimal solution for learn-to-defer systems under various constraints. Furthermore, we design a generalizable algorithm to estimate that solution and apply this algorithm to the COMPAS and ACSIncome datasets. Our algorithm shows improvements in terms of constraint violation over a set of baselines.

7/18/2024

Learning to Complement and to Defer to Multiple Users

Zheng Zhang, Wenjie Ai, Kevin Wells, David Rosewarne, Thanh-Toan Do, Gustavo Carneiro

With the development of Human-AI Collaboration in Classification (HAI-CC), integrating users and AI predictions becomes challenging due to the complex decision-making process. This process has three options: 1) AI autonomously classifies, 2) learning to complement, where AI collaborates with users, and 3) learning to defer, where AI defers to users. Despite their interconnected nature, these options have been studied in isolation rather than as components of a unified system. In this paper, we address this weakness with the novel HAI-CC methodology, called Learning to Complement and to Defer to Multiple Users (LECODU). LECODU not only combines learning to complement and learning to defer strategies, but it also incorporates an estimation of the optimal number of users to engage in the decision process. The training of LECODU maximises classification accuracy and minimises collaboration costs associated with user involvement. Comprehensive evaluations across real-world and synthesized datasets demonstrate LECODU's superior performance compared to state-of-the-art HAI-CC methods. Remarkably, even when relying on unreliable users with high rates of label noise, LECODU exhibits significant improvement over both human decision-makers alone and AI alone.

7/10/2024

Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Jean V. Alves, Diogo Leit~ao, S'ergio Jesus, Marco O. P. Sampaio, Javier Li'ebana, Pedro Saleiro, M'ario A. T. Figueiredo, Pedro Bizarro

Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset; and iii) not dealing with human work-capacity constraints. To address these issues, we propose the textit{deferral under cost and capacity constraints framework} (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost, subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work-capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average $8.4%$ reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf

8/21/2024