Modeling offensive content detection for TikTok

Read original: arXiv:2408.16857 - Published 9/2/2024 by Kasper Cools, Gideon Mailette de Buy Wenniger, Clara Maathuis

Modeling offensive content detection for TikTok

Overview

This paper presents a model for detecting offensive content on the TikTok platform.
The model uses BERT, a popular language model, to classify text as offensive or non-offensive.
The research aims to help TikTok moderate user-generated content and maintain a safe environment.

Plain English Explanation

TikTok and Offensive Content

TikTok is a popular social media platform where users can create and share short videos. Like many social media sites, TikTok can sometimes host offensive or inappropriate content. Detecting and removing this type of content is an important challenge for the platform.

Using BERT for Detection

The researchers in this paper developed a model that uses BERT, a powerful language model, to identify offensive content on TikTok. BERT is able to understand the context and meaning of text, which is important for accurately detecting offensive language.

Improving Content Moderation

By deploying this offensive content detection model, TikTok can more effectively moderate user-generated content and keep the platform safe and welcoming for all users. This type of AI-powered moderation system can be an important tool for social media companies to proactively address harmful or inappropriate posts.

Technical Explanation

Data and Preprocessing

The researchers used a dataset of TikTok comments labeled as offensive or non-offensive. They preprocessed the text data by converting it to lowercase, removing URLs, and tokenizing the comments into sequences of words that BERT can understand.

BERT-based Classification

The core of the model is a BERT-based classifier. BERT takes the preprocessed text as input and outputs a prediction of whether the content is offensive or not. The model was fine-tuned on the TikTok comment dataset to specialize its performance for this task.

Evaluation

The researchers evaluated their BERT-based offensive content detection model on held-out test data. They measured the model's accuracy, precision, recall, and F1-score, all of which were high, indicating strong performance in identifying offensive content.

Critical Analysis

The paper provides a well-designed and effective solution for detecting offensive content on TikTok. However, a few potential limitations and areas for further research are worth noting:

The dataset used for training and evaluation may not fully capture the breadth of offensive language and content that exists on the platform. Expanding the dataset could further improve the model's robustness.
The paper does not address potential biases or fairness issues that could arise from the model's predictions. Ensuring the system is equitable and does not disproportionately flag certain groups is an important consideration.
While the BERT-based approach demonstrates strong performance, exploring complementary techniques, such as multimodal analysis of text and images/videos, could further enhance the model's capabilities for the TikTok use case.

Conclusion

This research presents a valuable contribution to the challenge of moderating user-generated content on social media platforms. By leveraging the power of BERT, the proposed model can effectively identify offensive content on TikTok, which is an important step towards maintaining a safe and inclusive online environment. The insights from this work could inform the development of advanced content moderation systems for other social media platforms as well.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Modeling offensive content detection for TikTok

Kasper Cools, Gideon Mailette de Buy Wenniger, Clara Maathuis

The advent of social media transformed interpersonal communication and information consumption processes. This digital landscape accommodates user intentions, also resulting in an increase of offensive language and harmful behavior. Concurrently, social media platforms collect vast datasets comprising user-generated content and behavioral information. These datasets are instrumental for platforms deploying machine learning and data-driven strategies, facilitating customer insights and countermeasures against social manipulation mechanisms like disinformation and offensive content. Nevertheless, the availability of such datasets, along with the application of various machine learning techniques, to researchers and practitioners, for specific social media platforms regarding particular events, is limited. In particular for TikTok, which offers unique tools for personalized content creation and sharing, the existing body of knowledge would benefit from having diverse comprehensive datasets and associated data analytics solutions on offensive content. While efforts from social media platforms, research, and practitioner communities are seen on this behalf, such content continues to proliferate. This translates to an essential need to make datasets publicly available and build corresponding intelligent solutions. On this behalf, this research undertakes the collection and analysis of TikTok data containing offensive content, building a series of machine learning and deep learning models for offensive content detection. This is done aiming at answering the following research question: How to develop a series of computational models to detect offensive content on TikTok?. To this end, a Data Science methodological approach is considered, 120.423 TikTok comments are collected, and on a balanced, binary classification approach, F1 score performance results of 0.863 is obtained.

9/2/2024

Towards Generalized Offensive Language Identification

Alphaeus Dmonte, Tejas Arya, Tharindu Ranasinghe, Marcos Zampieri

The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed to automatically identify potentially harmful content and mitigate its impact. These systems can follow two approaches; (1) Use publicly available models and application endpoints, including prompting large language models (LLMs) (2) Annotate datasets and train ML models on them. However, both approaches lack an understanding of how generalizable they are. Furthermore, the applicability of these systems is often questioned in off-domain and practical environments. This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark. We answer three research questions on generalizability. Our findings will be useful in creating robust real-world offensive language detection systems.

7/29/2024

A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

Jack West, Lea Thiemt, Shimaa Ahmed, Maggie Bartig, Kassem Fawaz, Suman Banerjee

Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.

4/1/2024

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

Yunze Xiao, Yujia Hu, Kenny Tsu Wei Choo, Roy Ka-wei Lee

Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce textsf{ToxiCloakCN}, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.

6/19/2024