IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes

2404.04520

Published 4/9/2024 by Shreenaga Chikoti, Shrey Mehta, Ashutosh Modi

IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes

Abstract

Memes are one of the most popular types of content used in an online disinformation campaign. They are primarily effective on social media platforms since they can easily reach many users. Memes in a disinformation campaign achieve their goal of influencing the users through several rhetorical and psychological techniques, such as causal oversimplification, name-calling, and smear. The SemEval 2024 Task 4 textit{Multilingual Detection of Persuasion Technique in Memes} on identifying such techniques in the memes is divided across three sub-tasks: ($mathbf{1}$) Hierarchical multi-label classification using only textual content of the meme, ($mathbf{2}$) Hierarchical multi-label classification using both, textual and visual content of the meme and ($mathbf{3}$) Binary classification of whether the meme contains a persuasion technique or not using it's textual and visual content. This paper proposes an ensemble of Class Definition Prediction (CDP) and hyperbolic embeddings-based approaches for this task. We enhance meme classification accuracy and comprehensiveness by integrating HypEmo's hierarchical label embeddings (Chen et al., 2023) and a multi-task learning framework for emotion prediction. We achieve a hierarchical F1-score of 0.60, 0.67, and 0.48 on the respective sub-tasks.

Create account to get full access

Overview

This paper presents a hierarchical embedding-based approach for detecting persuasion techniques in memes, as part of SemEval-2024 Task 4.
The proposed model leverages a hierarchical structure to capture both textual and visual information in memes, enabling more accurate detection of persuasion techniques.
The authors evaluate their method on a dataset of memes and demonstrate its effectiveness compared to baseline approaches.

Plain English Explanation

The paper describes a new way to detect persuasion techniques in memes, which are a type of internet image that often include both text and visuals. The researchers developed a model that can analyze both the text and the visual elements of a meme to identify persuasion techniques, such as appealing to emotions or using logical fallacies.

The key idea is to use a hierarchical structure, which means the model looks at the meme in multiple layers. First, it analyzes the individual text and visual components, and then it combines that information to understand the overall message of the meme. This approach allows the model to capture more nuanced persuasion tactics that may not be evident from just the text or just the images alone.

The researchers tested their model on a dataset of memes and showed that it outperformed other methods for detecting persuasion techniques. This suggests their hierarchical embedding approach is a promising way to analyze the complex interplay of text and visuals in memes, which can be useful for understanding the spread of misinformation and manipulative content online.

Technical Explanation

The paper presents a hierarchical embedding-based approach for the detection of persuasion techniques in memes, as part of the SemEval-2024 Task 4 challenge. The proposed model, IITK at SemEval-2024 Task 1: Contrastive, leverages a hierarchical structure to capture both textual and visual information in memes, enabling more accurate detection of persuasion techniques.

The authors first obtain text and visual embeddings for the individual components of a meme (e.g., text, images, logos) using pre-trained models. These embeddings are then fed into a hierarchical attention network, which learns to attend to the most relevant textual and visual features for persuasion technique detection. The final meme-level representation is obtained by combining the hierarchical text and visual embeddings.

The authors evaluate their method on the SemEval-2024 Task 4 dataset, which contains memes annotated with various persuasion techniques. They compare their approach to baseline methods, such as BCAMIRS at SemEval-2024 Task 4: Beyond and PetKaz at SemEval-2024 Task 8: Can, and demonstrate the effectiveness of their hierarchical embedding-based model for this task.

Critical Analysis

The paper presents a well-designed and thorough approach to the task of persuasion technique detection in memes. The hierarchical structure of the model is a strength, as it allows the model to capture the complex interplay between textual and visual elements in memes. This is an important consideration, as persuasion techniques often rely on the combination of text and images to achieve their desired effect.

However, the paper does not address some potential limitations of the proposed approach. For example, the model may struggle to generalize to memes that use more subtle or complex persuasion techniques, or to memes that deviate significantly from the training data. Additionally, the paper does not discuss the interpretability of the model's predictions, which could be an important consideration for applications where transparency is crucial, such as Interpretable Detection of Out-of-Context Misinformation in Neural-Symbolic systems.

Further research could explore ways to improve the model's robustness and generalization, as well as its interpretability and explainability. Incorporating techniques from the IITK at SemEval-2024 Task 10: Who task, which focuses on detecting misattributed claims, could also be a fruitful avenue for extending this work.

Conclusion

The paper presents a novel hierarchical embedding-based approach for detecting persuasion techniques in memes, which outperforms baseline methods on the SemEval-2024 Task 4 dataset. The key strength of the proposed model is its ability to capture the complex interplay between textual and visual elements in memes, which is crucial for accurately identifying persuasion techniques.

This research contributes to the growing field of automated content analysis, with potential applications in areas such as misinformation detection and online safety. By developing more sophisticated models for understanding the persuasive mechanisms underlying memes and other online content, researchers can help develop tools to combat the spread of manipulative and misleading information on the internet.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes

Amirhossein Abaskohi, Amirhossein Dabiriaghdam, Lele Wang, Giuseppe Carenini

Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed to identify rhetorical and psychological persuasion techniques embedded within memes. To tackle this problem, we introduced a caption generation step to assess the modality gap and the impact of additional semantic information from images, which improved our result. Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder. It outperforms the baseline by a large margin in all 12 subtasks. In particular, it ranked in top-3 across all languages in Subtask 2a, and top-4 in Subtask 2b, demonstrating quantitatively strong performance. The improvement achieved by the introduced intermediate step is likely attributable to the metaphorical essence of images that challenges visual encoders. This highlights the potential for improving abstract visual semantics encoding.

6/13/2024

cs.CL cs.CV cs.IT cs.LG

🔎

Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning

Jingbiao Mei, Jinghong Chen, Weizhe Lin, Bill Byrne, Marcus Tomalin

Hateful memes have emerged as a significant concern on the Internet. Detecting hateful memes requires the system to jointly understand the visual and textual modalities. Our investigation reveals that the embedding space of existing CLIP-based systems lacks sensitivity to subtle differences in memes that are vital for correct hatefulness classification. We propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training. Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 87.0, outperforming much larger fine-tuned large multimodal models. We demonstrate a retrieval-based hateful memes detection system, which is capable of identifying hatefulness based on data unseen in training. This allows developers to update the hateful memes detection system by simply adding new examples without retraining, a desirable feature for real services in the constantly evolving landscape of hateful memes on the Internet.

6/6/2024

cs.CL cs.CV

ArMeme: Propagandistic Content in Arabic Memes

Firoj Alam, Abul Hasnat, Fatema Ahmed, Md Arid Hasan, Maram Hasanain

With the rise of digital communication, memes have become a significant medium for cultural and political expression that is often used to mislead audiences. Identification of such misleading and persuasive multimodal content has become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to individuals, organizations, and/or society. While there has been effort to develop AI-based automatic systems for resource-rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated ~6K Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We will make them publicly available for the community.

6/7/2024

cs.CL cs.AI cs.CV

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Language Model (LLM) analysis to comprehensively understand and classify harmful memes. Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil. To enhance the system's performance, we fine-tuned our approach by leveraging additional data labeled using GPT-4V, aiming to distill the understanding capability of GPT-4V for harmful memes to our system. Our framework achieves top-1 at the public leaderboard of the Online Safety Prize Challenge hosted by AI Singapore, with the AUROC as 0.7749 and accuracy as 0.7087, significantly ahead of the other teams. Notably, our approach outperforms previous benchmarks, with FLAVA achieving an AUROC of 0.5695 and VisualBERT an AUROC of 0.5561.

6/17/2024

cs.AI cs.CL cs.CV