Learning from Crowds with Crowd-Kit

Read original: arXiv:2109.08584 - Published 4/9/2024 by Dmitry Ustalov, Nikita Pavlichenko, Boris Tseitlin

🚀

Overview

This paper introduces Crowd-Kit, a toolkit for computational quality control in crowdsourcing.
Crowd-Kit provides efficient implementations of popular quality control algorithms in Python, including methods for truth inference, deep learning from crowds, and data quality estimation.
The toolkit supports multiple modalities of answers and provides dataset loaders and example notebooks for faster prototyping.
The authors extensively evaluated their toolkit on several datasets, enabling benchmarking of computational quality control methods in a uniform, systematic, and reproducible way.
The code and data are released under the Apache License 2.0 on GitHub.

Plain English Explanation

Crowd-Kit is a tool that helps researchers and companies better manage the quality of data collected through crowdsourcing. Crowdsourcing is the practice of getting work done by having a large number of people, often online, contribute small amounts of work. This can be a cost-effective way to gather data, but it can also be challenging to ensure the quality of the collected information.

Crowd-Kit provides a set of algorithms and methods that can automatically assess the reliability of crowdsourced data. For example, it can infer the true answers to questions based on the responses from many different people, or it can identify individuals who are providing low-quality or unreliable data. This can help researchers and companies trust the data they collect through crowdsourcing and use it more effectively in their work.

The toolkit is designed to be easy to use, with pre-built tools and sample datasets that allow users to get started quickly. By providing a standard set of quality control methods, Crowd-Kit also makes it easier to compare the effectiveness of different approaches across different datasets and applications.

Technical Explanation

Crowd-Kit is a Python-based toolkit that provides efficient implementations of several popular computational quality control algorithms for crowdsourcing. These include:

Truth inference: Methods for determining the correct answers to tasks or questions based on the responses of multiple crowdsourced workers. This can help identify reliable information even when individual workers may make mistakes.
Deep learning from crowds: Techniques that use deep neural networks to learn from crowdsourced data, leveraging the collective knowledge of many workers to improve model performance.
Data quality estimation: Algorithms that assess the quality and reliability of crowdsourced data, allowing researchers to identify and filter out low-quality or unreliable responses.

The toolkit supports multiple types of crowdsourced data, including text, images, and structured responses. It also includes dataset loaders and example notebooks to help users quickly get started with applying the quality control methods to their own crowdsourcing projects.

The authors extensively evaluated Crowd-Kit on a variety of datasets, demonstrating its ability to improve the quality and reliability of crowdsourced data in a consistent and reproducible way. This allows for better benchmarking and comparison of different computational quality control approaches.

Critical Analysis

The Crowd-Kit paper provides a valuable contribution to the field of crowdsourcing by offering a comprehensive toolkit for computational quality control. By implementing several established methods in a well-documented and easy-to-use package, the authors make it more accessible for researchers and practitioners to incorporate these techniques into their crowdsourcing workflows.

One potential limitation of the paper is that it does not delve deeply into the theoretical underpinnings or novel algorithmic developments of the quality control methods included in Crowd-Kit. The focus is more on the practical aspects of the toolkit, such as its ease of use and extensibility. While this is a reasonable approach for a tool-oriented paper, readers interested in the latest advancements in crowdsourcing quality control may want to also explore related research, such as Dollar Crowd: Multi-Hypothesis Crowd Density Estimation Using, Introducing ChatsQC: Enhancing Statistical Quality Control, Automatic Gradient Estimation for Calibrating Crowd Models, and Deep Feature Statistics for Mapping Generalized Screen Content.

Additionally, while the paper discusses the evaluation of Crowd-Kit on various datasets, it does not provide a comprehensive analysis of the toolkit's performance compared to other existing quality control solutions. Readers may benefit from a more in-depth comparison to understand the relative strengths and limitations of Crowd-Kit compared to alternative approaches.

Conclusion

Crowd-Kit is a valuable tool that can help researchers and companies better manage the quality of data collected through crowdsourcing. By providing efficient implementations of popular quality control algorithms, the toolkit makes it easier to assess the reliability of crowdsourced information and use it more effectively in various applications, from AI-generated image analysis to general data analysis and processing tasks. The open-source release of Crowd-Kit under the Apache License 2.0 allows for wider adoption and further development of the toolkit by the research community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

Learning from Crowds with Crowd-Kit

Dmitry Ustalov, Nikita Pavlichenko, Boris Tseitlin

This paper presents Crowd-Kit, a general-purpose computational quality control toolkit for crowdsourcing. Crowd-Kit provides efficient and convenient implementations of popular quality control algorithms in Python, including methods for truth inference, deep learning from crowds, and data quality estimation. Our toolkit supports multiple modalities of answers and provides dataset loaders and example notebooks for faster prototyping. We extensively evaluated our toolkit on several datasets of different natures, enabling benchmarking computational quality control methods in a uniform, systematic, and reproducible way using the same codebase. We release our code and data under the Apache License 2.0 at https://github.com/Toloka/crowd-kit.

4/9/2024

📊

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Christopher Klugmann, Rafid Mahmood, Guruprasad Hegde, Amit Kale, Daniel Kondermann

Labeling visual data is expensive and time-consuming. Crowdsourcing systems promise to enable highly parallelizable annotations through the participation of monetarily or otherwise motivated workers, but even this approach has its limits. The solution: replace manual work with machine work. But how reliable are machine annotators? Sacrificing data quality for high throughput cannot be acceptable, especially in safety-critical applications such as autonomous driving. In this paper, we present a framework that enables quality checking of visual data at large scales without sacrificing the reliability of the results. We ask annotators simple questions with discrete answers, which can be highly automated using a convolutional neural network trained to predict crowd responses. Unlike the methods of previous work, which aim to directly predict soft labels to address human uncertainty, we use per-task posterior distributions over soft labels as our training objective, leveraging a Dirichlet prior for analytical accessibility. We demonstrate our approach on two challenging real-world automotive datasets, showing that our model can fully automate a significant portion of tasks, saving costs in the high double-digit percentage range. Our model reliably predicts human uncertainty, allowing for more accurate inspection and filtering of difficult examples. Additionally, we show that the posterior distributions over soft labels predicted by our model can be used as priors in further inference processes, reducing the need for numerous human labelers to approximate true soft labels accurately. This results in further cost reductions and more efficient use of human resources in the annotation process.

9/4/2024

Data Quality in Crowdsourcing and Spamming Behavior Detection

Yang Ba, Michelle V. Mancenido, Erin K. Chiou, Rong Pan

As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credibility. Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations. We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition, and we classify spammers into three categories based on their different behavioral patterns. A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility by utilizing the Markov chain and generalized random effects models. Furthermore, we showcase the practicality of our techniques and their advantages by applying them on a face verification task with both simulation and real-world data collected from two crowdsourcing platforms.

4/30/2024

📊

Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare

P. Barai, G. Leroy, P. Bisht, J. M. Rothman, S. Lee, J. Andrews, S. A. Rice, A. Ahmed

Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19 percent compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.

5/24/2024