ThatiAR: Subjectivity Detection in Arabic News Sentences

Read original: arXiv:2406.05559 - Published 6/11/2024 by Reem Suwaileh, Maram Hasanain, Fatema Hubail, Wajdi Zaghouani, Firoj Alam
Total Score

0

🔎

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper examines the challenges of maintaining objectivity in annotations of subjectivity in Arabic news articles.
  • The researchers conducted a preliminary assessment to understand the levels of subjectivity and disagreement among annotators when classifying the subjectivity of news articles.
  • The findings reveal significant challenges in achieving consistent and objective annotations, highlighting the need for further research and discussion on this important topic.

Plain English Explanation

When analyzing the tone and perspective of news articles, researchers often rely on human annotators to classify the level of subjectivity in the text. However, determining what is "objective" or "subjective" can be quite difficult, as people's perceptions and biases can influence their judgments.

The researchers of this paper wanted to explore these challenges more deeply, focusing on news articles written in Arabic. They recruited a group of annotators and asked them to classify a set of news articles as either "subjective" or "objective." The results showed a significant amount of disagreement among the annotators, with many articles receiving conflicting subjective/objective labels.

This suggests that achieving truly objective annotations of subjectivity is an elusive goal, at least with the current approaches. The researchers highlight the need for further investigation into the factors that contribute to these subjective judgments, as well as the development of more robust and consistent annotation methodologies.

By understanding the challenges inherent in this type of analysis, the research community can work towards improving the reliability and validity of subjectivity assessments, particularly in the context of exploring subjectivity: a more human-centric assessment of social media and investigating persuasion techniques in Arabic text. This can have important implications for a wide range of applications, from sentence-level subjectivity detection in English news to the analysis of Arabic memes and propagandistic content.

Technical Explanation

The researchers conducted a preliminary study to investigate the challenges of achieving objective annotations of subjectivity in Arabic news articles. They recruited a group of 10 annotators and asked them to classify a set of 200 news articles as either "subjective" or "objective."

The results revealed a significant level of disagreement among the annotators, with only about half of the articles receiving consistent subjective/objective labels. The researchers calculated various inter-rater reliability metrics, such as Fleiss' kappa and Krippendorff's alpha, which all indicated low levels of agreement.

Further analysis showed that the annotators' judgments were influenced by various factors, including their personal biases, cultural backgrounds, and perceptions of what constitutes "objectivity." The researchers also noted that the linguistic complexity and nuances of the Arabic language may have contributed to the challenges in achieving consistent annotations.

The findings of this study highlight the inherent difficulties in relying on human annotations for assessing subjectivity, particularly in the context of large-scale Arabic language datasets. The researchers suggest that more robust and systematic approaches, potentially involving machine learning techniques, may be necessary to overcome these challenges and improve the reliability and validity of subjectivity assessments.

Critical Analysis

The researchers acknowledge the preliminary nature of this study and the need for further investigation to better understand the factors that contribute to the lack of annotator objectivity. They also note that the study was limited to a relatively small set of news articles and a relatively small group of annotators, which may have influenced the results.

One potential limitation of the study is the lack of a clear, well-defined definition of "objectivity" and "subjectivity" that was provided to the annotators. Without a shared understanding of these concepts, it is not surprising that the annotators struggled to reach consistent judgments.

Additionally, the researchers did not explore the potential impact of individual differences among the annotators, such as their level of expertise, political leanings, or personal experiences. These factors may have played a significant role in shaping their perceptions of subjectivity and objectivity.

Despite these limitations, the study raises important questions about the reliability and validity of subjective annotations, particularly in the context of corpus-level subjectivity detection in English news and exploring subjectivity in social media. The findings suggest that more rigorous and systematic approaches to subjectivity assessment may be necessary to ensure the trustworthiness and reproducibility of research in this area.

Conclusion

This preliminary study highlights the significant challenges in maintaining objectivity when annotating the subjectivity of news articles in Arabic. The researchers found that even among a group of trained annotators, there was a substantial level of disagreement in their judgments, suggesting that subjective perceptions and biases can heavily influence the assessment of objectivity.

The findings have important implications for a wide range of applications, from sentence-level subjectivity detection in English news to the analysis of Arabic memes and propagandistic content. By highlighting the inherent challenges in this type of analysis, the researchers encourage the research community to explore more robust and systematic approaches to subjectivity assessment, potentially involving machine learning techniques and more rigorous definitions of objectivity and subjectivity.

Ultimately, this study serves as an important reminder that achieving truly objective annotations is an elusive goal, and that researchers must be mindful of the limitations and potential biases inherent in this type of analysis. By acknowledging these challenges, the research community can work towards developing more reliable and valid methods for assessing subjectivity, with important implications for a wide range of applications.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Total Score

0

ThatiAR: Subjectivity Detection in Arabic News Sentences

Reem Suwaileh, Maram Hasanain, Fatema Hubail, Wajdi Zaghouani, Firoj Alam

Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and other high-resourced languages. In this study, we present the first large dataset for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanation. In addition, we included instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were strongly influenced by their political, cultural, and religious backgrounds, especially at the beginning of the annotation process. The experimental results suggest that LLMs with in-context learning provide better performance. We aim to release the dataset and resources for the community.

Read more

6/11/2024

🔎

Total Score

0

A Corpus for Sentence-level Subjectivity Detection on English News Articles

Francesco Antici, Andrea Galassi, Federico Ruggeri, Katerina Korre, Arianna Muti, Alessandra Bardi, Alice Fedotova, Alberto Barr'on-Cede~no

We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.

Read more

5/27/2024

Nullpointer at CheckThat! 2024: Identifying Subjectivity from Multilingual Text Sequence
Total Score

0

Nullpointer at CheckThat! 2024: Identifying Subjectivity from Multilingual Text Sequence

Md. Rafiul Biswas, Abrar Tasneem Abir, Wajdi Zaghouani

This study addresses a binary classification task to determine whether a text sequence, either a sentence or paragraph, is subjective or objective. The task spans five languages: Arabic, Bulgarian, English, German, and Italian, along with a multilingual category. Our approach involved several key techniques. Initially, we preprocessed the data through parts of speech (POS) tagging, identification of question marks, and application of attention masks. We fine-tuned the sentiment-based Transformer model 'MarieAngeA13/Sentiment-Analysis-BERT' on our dataset. Given the imbalance with more objective data, we implemented a custom classifier that assigned greater weight to objective data. Additionally, we translated non-English data into English to maintain consistency across the dataset. Our model achieved notable results, scoring top marks for the multilingual dataset (Macro F1=0.7121) and German (Macro F1=0.7908). It ranked second for Arabic (Macro F1=0.4908) and Bulgarian (Macro F1=0.7169), third for Italian (Macro F1=0.7430), and ninth for English (Macro F1=0.6893).

Read more

7/16/2024

The FIGNEWS Shared Task on News Media Narratives
Total Score

0

The FIGNEWS Shared Task on News Media Narratives

Wajdi Zaghouani (Northwestern University in Qatar), Mustafa Jarrar (Birzeit University), Nizar Habash (New York University Abu Dhabi), Houda Bouamor (Carnegie Mellon University Qatar), Imed Zitouni (Google), Mona Diab (Carnegie Mellon University), Samhaa R. El-Beltagy (Newgiza University), Muhammed AbuOdeh (New York University Abu Dhabi)

We present an overview of the FIGNEWS shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. The shared task addresses bias and propaganda annotation in multilingual news posts. We focus on the early days of the Israel War on Gaza as a case study. The task aims to foster collaboration in developing annotation guidelines for subjective tasks by creating frameworks for analyzing diverse narratives highlighting potential bias and propaganda. In a spirit of fostering and encouraging diversity, we address the problem from a multilingual perspective, namely within five languages: English, French, Arabic, Hebrew, and Hindi. A total of 17 teams participated in two annotation subtasks: bias (16 teams) and propaganda (6 teams). The teams competed in four evaluation tracks: guidelines development, annotation quality, annotation quantity, and consistency. Collectively, the teams produced 129,800 data points. Key findings and implications for the field are discussed.

Read more

7/26/2024