Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Read original: arXiv:2406.19867 - Published 7/1/2024 by Gabriele Di Bona, Emma Fraxanet, Bjorn Komander, Andrea Lo Sasso, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi
Total Score

0

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Sign in to get full access

or

If you already have an account, we'll log you in

Introduction

This paper examines how the use of sampled datasets can introduce substantial bias when identifying political polarization on social media. The researchers argue that the way data is collected and sampled can significantly impact the conclusions drawn about the extent of political division on platforms like Twitter.

Related Work

The paper situates its analysis within a broader body of research on using social media data to study political trends and behaviors. It cites several related studies that have explored topics such as analyzing and estimating support for US presidential candidates on Twitter, the prevalence and biases of election polls on social media, and inferring political leaning through plurinational scenarios. The paper also references work on how verified authors shape discursive communities on Twitter and mapping users' sharing habits to understand persistence of contrarianism on the platform.

Plain English Explanation

The key idea of this paper is that the way researchers collect and sample data from social media can significantly impact the conclusions they draw about political polarization. The authors argue that many studies on this topic rely on partial or biased datasets, which can lead to an exaggerated perception of how divided political discourse is online.

For example, if a researcher only collects data from a small subset of politically active Twitter users, they may observe high levels of partisan conflict and conclude that the overall platform is highly polarized. However, this narrow sample may not be representative of the broader user base, which could include many people who are not as actively engaged in political discussions.

Similarly, if a study focuses only on users who frequently express strong political views, it may miss the more moderate majority who are less vocal about their beliefs. By expanding the data collection to include a wider range of users and perspectives, the researchers suggest that a more nuanced and accurate picture of political discourse on social media can emerge.

Technical Explanation

The paper presents an empirical analysis that compares the results of polarization measures calculated on different samples of Twitter data. The researchers collected data from the Twitter API using a variety of sampling strategies, including:

  1. Collecting tweets from a random sample of users
  2. Focusing on users who frequently discuss politics
  3. Targeting users with verified accounts, which are more likely to be influential public figures

The team then applied common metrics for measuring political polarization, such as the degree of ideological clustering and the prevalence of negative interactions between users with opposing views. By comparing the results across these different datasets, the authors demonstrate that the choice of sampling approach can lead to significantly different assessments of the extent of polarization on the platform.

Critical Analysis

The paper acknowledges several limitations of the study, including the challenges of defining and measuring political polarization on social media. The authors note that their analysis is constrained by the data available through the Twitter API, which may not capture the full breadth of online political discourse.

Additionally, the paper does not address potential biases in the users who choose to engage on Twitter or the ways in which platform design and moderation policies may shape political interactions. Further research would be needed to fully understand how these factors contribute to the perception of polarization.

Overall, the study raises important questions about the reliability of social media data for studying complex political phenomena. It encourages researchers to be cautious in their interpretations and to carefully consider the limitations of their sampling approaches when drawing conclusions about the state of political discourse online.

Conclusion

This paper highlights the substantial risk of bias that can arise from the use of sampled datasets when studying political polarization on social media. The authors demonstrate how different data collection strategies can lead to vastly different assessments of the extent of division and conflict on platforms like Twitter.

The findings have significant implications for scholars, policymakers, and the general public who rely on social media data to understand political trends and dynamics. The paper underscores the need for more rigorous and comprehensive data collection methods, as well as a nuanced interpretation of the resulting insights. By acknowledging these challenges, researchers can work towards a more accurate and holistic understanding of how political discourse unfolds in the digital age.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media
Total Score

0

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Gabriele Di Bona, Emma Fraxanet, Bjorn Komander, Andrea Lo Sasso, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi

Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we demonstrate that keyword-based samples can be representative if keywords are selected with great care, but that poorly selected keywords can result in substantial political bias in the sampled data. Our findings demonstrate that it is not possible to measure polarization in a reliable way with small, sampled datasets, highlighting why the current lack of research data is so problematic, and providing insight into the practical implementation of the European Union's Digital Service Act which aims to improve researchers' access to social media data.

Read more

7/1/2024

Dynamics of Ideological Biases of Social Media Users
Total Score

1

Dynamics of Ideological Biases of Social Media Users

Mohammed Shahid Modi, James Flamino, Boleslaw K. Szymanski

Humanity for centuries has perfected skills of interpersonal interactions and evolved patterns that enable people to detect lies and deceiving behavior of others in face-to-face settings. Unprecedented growth of people's access to mobile phones and social media raises an important question: How does this new technology influence people's interactions and support the use of traditional patterns? In this article, we answer this question for homophily-driven patterns in social media. In our previous studies, we found that, on a university campus, changes in student opinions were driven by the desire to hold popular opinions. Here, we demonstrate that the evolution of online platform-wide opinion groups is driven by the same desire. We focus on two social media: Twitter and Parler, on which we tracked the political biases of their users. On Parler, an initially stable group of Right-biased users evolved into a permanent Right-leaning echo chamber dominating weaker, transient groups of members with opposing political biases. In contrast, on Twitter, the initial presence of two large opposing bias groups led to the evolution of a bimodal bias distribution, with a high degree of polarization. We capture the movement of users from the initial to final bias groups during the tracking period. We also show that user choices are influenced by side-effects of homophily. Users entering the platform attempt to find a sufficiently large group whose members hold political biases within the range sufficiently close to their own. If successful, they stabilize their biases and become permanent members of the group. Otherwise, they leave the platform. We believe that the dynamics of users' behavior uncovered in this article create a foundation for technical solutions supporting social groups on social media and socially aware networks.

Read more

7/23/2024

Analyzing and Estimating Support for U.S. Presidential Candidates in Twitter Polls
Total Score

0

Analyzing and Estimating Support for U.S. Presidential Candidates in Twitter Polls

Stephen Scarano, Vijayalakshmi Vasudevan, Chhandak Bagchi, Mattia Samory, JungHwan Yang, Przemyslaw A. Grabowicz

Polls posted on social media have emerged in recent years as an important tool for estimating public opinion, e.g., to gauge public support for business decisions and political candidates in national elections. Here, we examine nearly two thousand Twitter polls gauging support for U.S. presidential candidates during the 2016 and 2020 election campaigns. First, we describe the rapidly emerging prevalence of social polls. Second, we characterize social polls in terms of their heterogeneity and response options. Third, leveraging machine learning models for user attribute inference, we describe the demographics, political leanings, and other characteristics of the users who author and interact with social polls. Finally, we study the relationship between social poll results, their attributes, and the characteristics of users interacting with them. Our findings reveal that Twitter polls are biased in various ways, starting from the position of the presidential candidates among the poll options to biases in demographic attributes and poll results. The 2016 and 2020 polls were predominantly crafted by older males and manifested a pronounced bias favoring candidate Donald Trump, in contrast to traditional surveys, which favored Democratic candidates. We further identify and explore the potential reasons for such biases in social polling and discuss their potential repercussions. Finally, we show that biases in social media polls can be corrected via regression and poststratification. The errors of the resulting election estimates can be as low as 1%-2%, suggesting that social media polls can become a promising source of information about public opinion.

Read more

6/6/2024

Election Polls on Social Media: Prevalence, Biases, and Voter Fraud Beliefs
Total Score

0

Election Polls on Social Media: Prevalence, Biases, and Voter Fraud Beliefs

Stephen Scarano, Vijayalakshmi Vasudevan, Mattia Samory, Kai-Cheng Yang, JungHwan Yang, Przemyslaw A. Grabowicz

Social media platforms allow users to create polls to gather public opinion on diverse topics. However, we know little about what such polls are used for and how reliable they are, especially in significant contexts like elections. Focusing on the 2020 presidential elections in the U.S., this study shows that outcomes of election polls on Twitter deviate from election results despite their prevalence. Leveraging demographic inference and statistical analysis, we find that Twitter polls are disproportionately authored by older males and exhibit a large bias towards candidate Donald Trump relative to representative mainstream polls. We investigate potential sources of biased outcomes from the point of view of inauthentic, automated, and counter-normative behavior. Using social media experiments and interviews with poll authors, we identify inconsistencies between public vote counts and those privately visible to poll authors, with the gap potentially attributable to purchased votes. We also find that Twitter accounts participating in election polls are more likely to be bots, and election poll outcomes tend to be more biased, before the election day than after. Finally, we identify instances of polls spreading voter fraud conspiracy theories and estimate that a couple thousand of such polls were posted in 2020. The study discusses the implications of biased election polls in the context of transparency and accountability of social media platforms.

Read more

6/4/2024