Towards Comprehensive Detection of Chinese Harmful Memes

Read original: arXiv:2410.02378 - Published 10/4/2024 by Junyu Lu, Bo Xu, Xiaokun Zhang, Hongbo Wang, Haohao Zhu, Dongyu Zhang, Liang Yang, Hongfei Lin

🔎

Overview

This paper presents a method for detecting harmful memes in Chinese social media.
The approach uses a deep learning model to analyze both the visual and textual components of memes.
The researchers evaluated their model on a dataset of Chinese memes and found it outperformed previous state-of-the-art methods.

Plain English Explanation

The paper describes a way to automatically detect harmful or offensive memes that are shared on Chinese social media platforms. Memes often combine an image with text, and can be used to spread misinformation or hate speech. The researchers developed a machine learning model that can analyze both the image and text of a meme to determine if it is harmful or not. They tested this model on a collection of Chinese memes and found it was better at identifying problematic content than previous techniques. This type of technology could be useful for social media platforms to help identify and remove harmful memes before they spread widely.

Technical Explanation

The paper introduces a deep learning model for detecting harmful memes in Chinese social media. The model takes both the image and text of a meme as input and outputs a prediction of whether the meme is harmful or not. The image component is processed using a convolutional neural network (CNN), while the text is processed using a transformer-based language model. The outputs of these two subnetworks are then combined and passed through additional fully connected layers to produce the final classification. The researchers trained and evaluated their model on a dataset of Chinese memes, and found it outperformed previous state-of-the-art methods for harmful meme detection.

Critical Analysis

The paper provides a valuable contribution to the field of harmful meme detection, particularly for the Chinese social media context. However, there are a few potential limitations and areas for further research:

The dataset used for evaluation, while substantial, may not fully capture the diversity of harmful memes that exist on Chinese social media. Further testing on additional datasets could help validate the model's performance.
The paper does not provide much insight into the types of harmful content the model is able to detect (e.g. hate speech, misinformation, etc.). A more nuanced analysis of the model's capabilities would be helpful.
The authors do not discuss potential biases or ethical considerations in deploying such a model in a real-world setting. These are important factors to consider when developing AI systems that moderate online content.

Overall, this is a well-designed study that advances the state-of-the-art in harmful meme detection, but further research is needed to fully understand the capabilities and limitations of the approach.

Conclusion

This paper presents a novel deep learning model for detecting harmful memes in Chinese social media. The model's ability to analyze both the visual and textual components of memes allows it to outperform previous methods. While the results are promising, there are still some open questions and areas for further exploration. Deploying such technology could help social media platforms better moderate harmful content, but care must be taken to address potential biases and ethical concerns. Overall, this research represents an important step forward in combating the spread of harmful information online.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

New!Towards Comprehensive Detection of Chinese Harmful Memes

Junyu Lu, Bo Xu, Xiaokun Zhang, Hongbo Wang, Haohao Zhu, Dongyu Zhang, Liang Yang, Hongfei Lin

This paper has been accepted in the NeurIPS 2024 D & B Track. Harmful memes have proliferated on the Chinese Internet, while research on detecting Chinese harmful memes significantly lags behind due to the absence of reliable datasets and effective detectors. To this end, we focus on the comprehensive detection of Chinese harmful memes. We construct ToxiCN MM, the first Chinese harmful meme dataset, which consists of 12,000 samples with fine-grained annotations for various meme types. Additionally, we propose a baseline detector, Multimodal Knowledge Enhancement (MKE), incorporating contextual information of meme content generated by the LLM to enhance the understanding of Chinese memes. During the evaluation phase, we conduct extensive quantitative experiments and qualitative analyses on multiple baselines, including LLMs and our MKE. The experimental results indicate that detecting Chinese harmful memes is challenging for existing models while demonstrating the effectiveness of MKE. The resources for this paper are available at https://github.com/DUT-lujunyu/ToxiCN_MM.

10/4/2024

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Language Model (LLM) analysis to comprehensively understand and classify harmful memes. Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil. To enhance the system's performance, we fine-tuned our approach by leveraging additional data labeled using GPT-4V, aiming to distill the understanding capability of GPT-4V for harmful memes to our system. Our framework achieves top-1 at the public leaderboard of the Online Safety Prize Challenge hosted by AI Singapore, with the AUROC as 0.7749 and accuracy as 0.7087, significantly ahead of the other teams. Notably, our approach outperforms previous benchmarks, with FLAVA achieving an AUROC of 0.5695 and VisualBERT an AUROC of 0.5561.

6/17/2024

🔎

Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning

Jingbiao Mei, Jinghong Chen, Weizhe Lin, Bill Byrne, Marcus Tomalin

Hateful memes have emerged as a significant concern on the Internet. Detecting hateful memes requires the system to jointly understand the visual and textual modalities. Our investigation reveals that the embedding space of existing CLIP-based systems lacks sensitivity to subtle differences in memes that are vital for correct hatefulness classification. We propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training. Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 87.0, outperforming much larger fine-tuned large multimodal models. We demonstrate a retrieval-based hateful memes detection system, which is capable of identifying hatefulness based on data unseen in training. This allows developers to update the hateful memes detection system by simply adding new examples without retraining, a desirable feature for real services in the constantly evolving landscape of hateful memes on the Internet.

6/6/2024

Deciphering Hate: Identifying Hateful Memes and Their Targets

Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum

Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges associated with low-resource languages like Bengali (also known as Bangla). Furthermore, while previous work on Bengali memes has focused on detecting hateful memes, there has been no work on detecting their targeted entities. To bridge this gap and facilitate research in this arena, we introduce a novel multimodal dataset for Bengali, BHM (Bengali Hateful Memes). The dataset consists of 7,148 memes with Bengali as well as code-mixed captions, tailored for two tasks: (i) detecting hateful memes, and (ii) detecting the social entities they target (i.e., Individual, Organization, Community, and Society). To solve these tasks, we propose DORA (Dual cO attention fRAmework), a multimodal deep neural network that systematically extracts the significant modality features from the memes and jointly evaluates them with the modality-specific features to understand the context better. Our experiments show that DORA is generalizable on other low-resource hateful meme datasets and outperforms several state-of-the-art rivaling baselines.

9/24/2024