Temporal Analysis of Drifting Hashtags in Textual Data Streams: A Graph-Based Application

Read original: arXiv:2402.10230 - Published 8/20/2024 by Cristiano M. Garcia, Alceu de Souza Britto Jr, Jean Paul Barddal

Temporal Analysis of Drifting Hashtags in Textual Data Streams: A Graph-Based Application

Overview

Examines how hashtags in textual data streams drift over time
Proposes a graph-based approach to analyze the temporal dynamics of hashtag usage
Demonstrates the effectiveness of the proposed method through experiments on real-world datasets

Plain English Explanation

The paper focuses on understanding how hashtags in textual data streams, such as social media posts, change and evolve over time. This is known as "hashtag drift." The researchers developed a graph-based approach to analyze these temporal dynamics of hashtag usage. Through experiments on real-world datasets, they demonstrated the effectiveness of their proposed method in capturing and visualizing how hashtags gain and lose popularity over time.

Technical Explanation

The paper first provides background on the importance of understanding temporal trends in textual data streams, particularly the evolution of hashtag usage. It then introduces a graph-based approach to model the temporal dynamics of hashtags. This involves constructing a graph where nodes represent hashtags and edges indicate co-occurrence within the same text. By analyzing the structure and changes in this graph over time, the researchers were able to identify and visualize patterns of hashtag drift.

Critical Analysis

The paper presents a novel and potentially valuable approach to analyzing the temporal evolution of hashtags in textual data streams. However, the researchers acknowledge that their method is limited to identifying changes in hashtag co-occurrence patterns and does not directly address the underlying reasons for these changes. Further research could explore incorporating additional contextual information, such as user sentiment or external events, to provide a more comprehensive understanding of the factors driving hashtag drift.

Conclusion

This paper introduces a graph-based approach for temporal analysis of drifting hashtags in textual data streams. The proposed method can effectively capture and visualize how hashtags gain and lose popularity over time, which has potential applications in areas such as social media monitoring and trend analysis. While the current approach has limitations, the research suggests promising directions for further exploration in understanding the dynamic nature of hashtag usage in digital communication.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Temporal Analysis of Drifting Hashtags in Textual Data Streams: A Graph-Based Application

Cristiano M. Garcia, Alceu de Souza Britto Jr, Jean Paul Barddal

Initially supported by Twitter, hashtags are now used on several social media platforms. Hashtags are helpful for tagging, tracking, and grouping posts on similar topics. In this paper, based on a hashtag stream regarding the hashtag #mybodymychoice, we analyze hashtag drifts over time using concepts from graph analysis and textual data streams using the Girvan-Newman method to uncover hashtag communities in annual snapshots between 2018 and 2022. In addition, we offer insights about some correlated hashtags found in the study. Our approach can be useful for monitoring changes over time in opinions and sentiment patterns about an entity on social media. Even though the hashtag #mybodymychoice was initially coupled with women's rights, abortion, and bodily autonomy, we observe that it suffered drifts during the studied period across topics such as drug legalization, vaccination, political protests, war, and civil rights. The year 2021 was the most significant drifting year, in which the communities detected and their respective sizes suggest that #mybodymychoice had a significant drift to vaccination and Covid-19-related topics.

8/20/2024

Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks

Joshua Melton, Shannon Reid, Gabriel Terejanu, Siddharth Krishnan

The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iteratively update the stance association of user and hashtag nodes via a label propagation mechanism. This set of soft labels is then integrated with the user-user interaction graph to train a graph neural network (GNN) model using semi-supervised learning. We evaluate this method on two large-scale datasets containing tweets related to climate change from June 2021 to June 2022 and gun control from January 2022 to January 2023. Our experiments demonstrate that enriching text-based embeddings of users with network information from the user interaction graph using our semi-supervised GNN method outperforms both classifiers trained on user textual embeddings and zero-shot classification using LLMs such as GPT4. We discuss the need for integrating nuanced understanding from social science with the scalability of computational methods to better understand how polarization on social media occurs for divisive issues such as climate change and gun control.

5/20/2024

🏋️

A Temporal Psycholinguistics Approach to Identity Resolution of Social Media Users

Md Touhidul Islam

In this thesis, we propose an approach to identity resolution across social media platforms using the topics, sentiments, and timings of the posts on the platforms. After collecting the public posts of around 5000 profiles from Disqus and Twitter, we analyze their posts to match their profiles across the two platforms. We pursue both temporal and non-temporal methods in our analysis. While neither approach proves definitively superior, the temporal approach generally performs better. We found that the temporal window size influences results more than the shifting amount. On the other hand, our sentiment analysis shows that the inclusion of sentiment makes little difference, probably due to flawed data extraction methods. We also experimented with a distance-based reward-and-punishment-focused scoring model, which achieved an accuracy of 24.198% and an average rank of 158.217 out of 2525 in our collected corpus. Future work includes refining sentiment analysis by evaluating sentiments per topic, extending temporal analysis with additional phases, and improving the scoring model through weight adjustments and modified rewards.

7/30/2024

💬

A Systematic Analysis on the Temporal Generalization of Language Models in Social Media

Asahi Ushio, Jose Camacho-Collados

In machine learning, temporal shifts occur when there are differences between training and test splits in terms of time. For streaming data such as news or social media, models are commonly trained on a fixed corpus from a certain period of time, and they can become obsolete due to the dynamism and evolving nature of online content. This paper focuses on temporal shifts in social media and, in particular, Twitter. We propose a unified evaluation scheme to assess the performance of language models (LMs) under temporal shift on standard social media tasks. LMs are tested on five diverse social media NLP tasks under different temporal settings, which revealed two important findings: (i) the decrease in performance under temporal shift is consistent across different models for entity-focused tasks such as named entity recognition or disambiguation, and hate speech detection, but not significant in the other tasks analysed (i.e., topic and sentiment classification); and (ii) continuous pre-training on the test period does not improve the temporal adaptability of LMs.

5/24/2024