Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda

Read original: arXiv:2407.09327 - Published 7/15/2024 by Lina Duaibes, Areej Jaber, Mustafa Jarrar, Ahmad Qadi, Mais Qandeel

🎲

Overview

• This paper presents a collection of multilingual datasets annotated for bias and propaganda, developed by the researchers at Sina for the FigNews 2024 competition.

Plain English Explanation

• The researchers have created datasets in multiple languages, including text from online news articles, social media posts, and other sources. These datasets have been carefully annotated to identify instances of biased or propagandistic content.

• Bias refers to presenting information in a way that favors a particular perspective or agenda, often subtly influencing the reader's opinion. Propaganda is the deliberate use of biased or misleading information to promote a specific political, social, or ideological cause.

• By developing these annotated datasets, the researchers aim to support the development of AI systems that can detect and analyze bias and propaganda in online content. This could be useful for a variety of applications, such as media analysis, fact-checking, and improving the quality of information consumed by the public.

Technical Explanation

• The researchers used a combination of crowdsourcing and expert annotation to label the datasets, with specific guidelines and training to ensure consistency. They collected text from a variety of sources, including news articles, social media posts, and other online content, across multiple languages.

• The annotations include labels for different types of bias (e.g., political, ideological, emotional) and propaganda techniques (e.g., loaded language, appeal to fear, bandwagon). The datasets also include metadata about the source, topic, and other contextual information to support further analysis.

• The researchers have made these datasets publicly available to support research and development in the field of automatic detection and analysis of bias and propaganda in online content. This can help advance the state of the art in natural language processing and content moderation technologies.

Critical Analysis

• The researchers acknowledge the inherent challenges in defining and identifying bias and propaganda, which can often be subjective and context-dependent. They discuss the need for further refinement and validation of the annotation guidelines and processes to ensure reliability and consistency.

• Additionally, the datasets may not be fully representative of the entire spectrum of online content, as the selection and sampling of sources could introduce biases. Expanding the diversity of sources and languages covered could strengthen the datasets and their applicability.

• The researchers also note the potential for these datasets to be misused, such as by training systems to amplify or perpetuate specific biases. Careful consideration of the ethical implications and responsible use of these resources is essential.

Conclusion

• The multilingual datasets annotated for bias and propaganda developed by the researchers at Sina for the FigNews 2024 competition represent a valuable contribution to the field of content analysis and moderation. These resources can support the development of AI systems that can better detect and understand the presence of biased and propagandistic content online, potentially improving the overall quality and reliability of information consumed by the public.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎲

Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda

Lina Duaibes, Areej Jaber, Mustafa Jarrar, Ahmad Qadi, Mais Qandeel

The proliferation of bias and propaganda on social media is an increasingly significant concern, leading to the development of techniques for automatic detection. This article presents a multilingual corpus of 12, 000 Facebook posts fully annotated for bias and propaganda. The corpus was created as part of the FigNews 2024 Shared Task on News Media Narratives for framing the Israeli War on Gaza. It covers various events during the War from October 7, 2023 to January 31, 2024. The corpus comprises 12, 000 posts in five languages (Arabic, Hebrew, English, French, and Hindi), with 2, 400 posts for each language. The annotation process involved 10 graduate students specializing in Law. The Inter-Annotator Agreement (IAA) was used to evaluate the annotations of the corpus, with an average IAA of 80.8% for bias and 70.15% for propaganda annotations. Our team was ranked among the bestperforming teams in both Bias and Propaganda subtasks. The corpus is open-source and available at https://sina.birzeit.edu/fada

7/15/2024

The FIGNEWS Shared Task on News Media Narratives

Wajdi Zaghouani (Northwestern University in Qatar), Mustafa Jarrar (Birzeit University), Nizar Habash (New York University Abu Dhabi), Houda Bouamor (Carnegie Mellon University Qatar), Imed Zitouni (Google), Mona Diab (Carnegie Mellon University), Samhaa R. El-Beltagy (Newgiza University), Muhammed AbuOdeh (New York University Abu Dhabi)

We present an overview of the FIGNEWS shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. The shared task addresses bias and propaganda annotation in multilingual news posts. We focus on the early days of the Israel War on Gaza as a case study. The task aims to foster collaboration in developing annotation guidelines for subjective tasks by creating frameworks for analyzing diverse narratives highlighting potential bias and propaganda. In a spirit of fostering and encouraging diversity, we address the problem from a multilingual perspective, namely within five languages: English, French, Arabic, Hebrew, and Hindi. A total of 17 teams participated in two annotation subtasks: bias (16 teams) and propaganda (6 teams). The teams competed in four evaluation tracks: guidelines development, annotation quality, annotation quantity, and consistency. Collectively, the teams produced 129,800 data points. Key findings and implications for the field are discussed.

7/26/2024

ArMeme: Propagandistic Content in Arabic Memes

Firoj Alam, Abul Hasnat, Fatema Ahmed, Md Arid Hasan, Maram Hasanain

With the rise of digital communication, memes have become a significant medium for cultural and political expression that is often used to mislead audiences. Identification of such misleading and persuasive multimodal content has become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to individuals, organizations, and/or society. While there has been effort to develop AI-based automatic systems for resource-rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated ~6K Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We will make them publicly available for the community.

6/7/2024

ArAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content

Maram Hasanain, Md. Arid Hasan, Fatema Ahmed, Reem Suwaileh, Md. Rafiul Biswas, Wajdi Zaghouani, Firoj Alam

We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community (https://araieval.gitlab.io/). We hope this will enable further research on these important tasks in Arabic.

7/8/2024