Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Read original: arXiv:2301.11850 - Published 9/16/2024 by Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabr'icio Benevenuto
Total Score

0

⚙️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces a new dataset called FactNews, which contains 6,191 sentences annotated for factuality and media bias.
  • The researchers use FactNews to build text classification models for predicting the factuality of news reporting and the bias of media outlets.
  • The paper focuses on the challenges of automated news credibility and fact-checking, specifically in the context of Brazil and the Portuguese language.

Plain English Explanation

The paper focuses on the problem of accurately predicting news factuality and media bias. To address this, the researchers created a new dataset called FactNews, which contains over 6,000 sentences that have been carefully annotated for how factual they are and how biased the media source is.

Using this dataset, the researchers developed machine learning models that can automatically determine whether a news sentence is factual or biased. They found that biased sentences tend to have more words and contain more emotional language compared to factual sentences.

This type of analysis can help identify unreliable news sources and detect political polarization and fake news, which is a significant problem in Brazil. The researchers provide the FactNews dataset and baseline models as a starting point for further research in the Portuguese language.

Technical Explanation

The researchers created the FactNews dataset by annotating 6,191 sentences from news articles according to definitions of factuality and media bias proposed by the AllSides organization. They then used this dataset to train two text classification models:

  1. A model for predicting the factuality of news sentences, which classifies sentences as either factual or biased.
  2. A model for predicting the overall bias of a news source, which looks at the distribution of factual and biased sentences to assess the reliability of the outlet.

Through their experiments, the researchers found that biased sentences tend to be longer and contain more emotional language compared to factual sentences. This suggests that fine-grained analysis of subjectivity and impartiality in news articles can be a promising approach for predicting media reliability.

Critical Analysis

The researchers acknowledge several limitations in their work. First, the FactNews dataset is relatively small, and further research is needed to scale up the models to handle a larger volume of content. Additionally, the definitions of factuality and media bias used in the dataset may not capture all the nuances of how these concepts are understood in different contexts.

Another potential issue is the inherent subjectivity in annotating news content for bias, as reasonable people may disagree on what constitutes a biased perspective. The researchers note that their approach focuses on sentence-level analysis, which may miss broader patterns of bias that emerge across an article or publication.

Finally, the paper is focused on the specific challenges of news credibility and fact-checking in Brazil, but the techniques and insights could potentially be applied to other countries and languages. Further research is needed to understand how well the models generalize to different cultural and political contexts.

Conclusion

This paper presents a promising approach for automated news credibility and fact-checking by leveraging a new dataset of annotated news sentences and machine learning models to detect factuality and media bias. The insights around the linguistic differences between factual and biased sentences could inform the development of more robust and scalable systems for identifying unreliable news sources and detecting political polarization and fake news.

The researchers' focus on the Portuguese language and the Brazilian context is particularly significant, as this is an understudied area in the field of automated fact-checking. The FactNews dataset and baseline models provided in this paper serve as a valuable resource for further research and development in this important domain.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⚙️

Total Score

0

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabr'icio Benevenuto

Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled FactNews, composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.

Read more

9/16/2024

🏅

Total Score

0

Exploring Factual Entailment with NLI: A News Media Study

Guy Mor-Lan, Effi Levi

We explore the relationship between factuality and Natural Language Inference (NLI) by introducing FactRel -- a novel annotation scheme that models textit{factual} rather than textit{textual} entailment, and use it to annotate a dataset of naturally occurring sentences from news articles. Our analysis shows that 84% of factually supporting pairs and 63% of factually undermining pairs do not amount to NLI entailment or contradiction, respectively, suggesting that factual relationships are more apt for analyzing media discourse. We experiment with models for pairwise classification on the new dataset, and find that in some cases, generating synthetic data with GPT-4 on the basis of the annotated dataset can improve performance. Surprisingly, few-shot learning with GPT-4 yields strong results on par with medium LMs (DeBERTa) trained on the labelled dataset. We hypothesize that these results indicate the fundamental dependence of this task on both world knowledge and advanced reasoning abilities.

Read more

6/26/2024

🔎

Total Score

0

Experiments in News Bias Detection with Pre-Trained Neural Transformers

Tim Menzner, Jochen L. Leidner

The World Wide Web provides unrivalled access to information globally, including factual news reporting and commentary. However, state actors and commercial players increasingly spread biased (distorted) or fake (non-factual) information to promote their agendas. We compare several large, pre-trained language models on the task of sentence-level news bias detection and sub-type classification, providing quantitative and qualitative results.

Read more

6/17/2024

🔎

Total Score

0

A Corpus for Sentence-level Subjectivity Detection on English News Articles

Francesco Antici, Andrea Galassi, Federico Ruggeri, Katerina Korre, Arianna Muti, Alessandra Bardi, Alice Fedotova, Alberto Barr'on-Cede~no

We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.

Read more

5/27/2024