Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Read original: arXiv:2404.14977 - Published 4/24/2024 by Muhammad Asif Auyb, Muhammad Tayyab Zamir, Imran Khan, Hannia Naseem, Nasir Ahmad, Kashif Ahmad

✨

Overview

This paper proposes a natural language processing (NLP) framework to automatically collect and analyze water-related social media posts for data-driven decision-making.
The framework includes two key components: text classification and topic modeling.
The researchers developed a merit-fusion-based text classification approach that combines multiple large language models (LLMs) and optimizes their weights.
They also used the BERTopic library for topic modeling to uncover hidden patterns in water-related tweets.
The paper also describes the creation of a large-scale dataset of annotated water-related social media posts to support future research.

Plain English Explanation

The paper focuses on a critical societal challenge: ensuring the quality of water, which is essential for economic and social development. Traditional methods for monitoring water networks, such as surveys, have limitations like low participation and infrequency due to the labor involved.

To address this, the researchers developed an NLP framework to automatically analyze water-related posts on social media. This could provide more frequent and comprehensive data to support better decision-making about water quality and management.

The framework has two main parts. First, it uses a text classification approach that combines multiple large language models (LLMs) and optimizes their weights to accurately categorize water-related posts. This helps identify relevant content from the vast amount of social media data.

Second, the framework employs topic modeling using the BERTopic library to uncover the hidden themes and concerns expressed in water-related tweets. This can provide insights into global, regional, and country-specific water issues.

The researchers also created a large annotated dataset of water-related social media posts, which can support future studies in this area, similar to work on modeling Wikipedia article quality and predicting question quality on Stack Overflow.

Overall, this framework offers a promising approach to leverage social media data for better monitoring and understanding of critical water infrastructure, similar to how disaster monitoring can be improved using NLP techniques.

Technical Explanation

The proposed NLP framework consists of two main components: text classification and topic modeling.

For text classification, the researchers developed a merit-fusion-based approach that combines multiple large language models (LLMs). They experimented with different weight selection and optimization methods to assign appropriate weights to the various LLMs, aiming to maximize the classification performance.

In the topic modeling component, the researchers employed the BERTopic library to discover the hidden topic patterns within the water-related tweets. BERTopic is a topic modeling technique that leverages the contextual understanding of BERT-based language models to identify and label the latent topics in the text corpus.

The researchers also analyzed the water-related tweets originating from different regions and countries to explore global, regional, and country-specific issues and concerns regarding water quality and management.

Additionally, the paper describes the creation of a large-scale dataset of manually annotated water-related social media posts. This dataset is expected to facilitate future research in this domain, similar to the datasets and approaches developed for assessing the quality of Wikipedia articles and predicting question quality on Stack Overflow.

Critical Analysis

The paper presents a promising approach to leveraging social media data for water quality monitoring and analysis. However, the researchers acknowledge several limitations and areas for further research.

One potential limitation is the reliance on the accuracy and completeness of the text classification and topic modeling components. While the researchers employed advanced techniques, the performance of the framework may be affected by the quality and biases inherent in social media data.

Additionally, the paper does not provide a comprehensive evaluation of the framework's real-world applicability and impact on water management decision-making. Further research is needed to assess the practical utility and scalability of the proposed approach, as well as its integration with existing water monitoring systems.

The creation of the annotated dataset is a valuable contribution, but the researchers could explore ways to expand the dataset's diversity and representation, ensuring it captures a broad range of water-related issues and perspectives from different regions and stakeholder groups.

Overall, the research presented in this paper offers a compelling approach to leveraging social media data for water quality analysis, and the proposed framework and dataset can serve as a foundation for further advancements in critical infrastructure monitoring and data-driven decision-making.

Conclusion

This paper proposes an NLP framework to automatically collect and analyze water-related social media posts for data-driven decision-making. The framework combines text classification and topic modeling techniques to identify relevant content and uncover hidden patterns in water-related discussions on social media.

The researchers developed a merit-fusion-based text classification approach that leverages multiple large language models, and they used the BERTopic library for topic modeling. The creation of a large-scale annotated dataset of water-related social media posts is also a valuable contribution that can support future research in this area.

While the paper highlights the potential of this approach, further research is needed to address limitations and assess the real-world impact of the proposed framework on water quality monitoring and management. Nonetheless, this work represents an important step towards leveraging social media data to better understand and address critical water-related challenges facing communities around the world.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

✨

Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case

Muhammad Asif Auyb, Muhammad Tayyab Zamir, Imran Khan, Hannia Naseem, Nasir Ahmad, Kashif Ahmad

This paper focuses on a very important societal challenge of water quality analysis. Being one of the key factors in the economic and social development of society, the provision of water and ensuring its quality has always remained one of the top priorities of public authorities. To ensure the quality of water, different methods for monitoring and assessing the water networks, such as offline and online surveys, are used. However, these surveys have several limitations, such as the limited number of participants and low frequency due to the labor involved in conducting such surveys. In this paper, we propose a Natural Language Processing (NLP) framework to automatically collect and analyze water-related posts from social media for data-driven decisions. The proposed framework is composed of two components, namely (i) text classification, and (ii) topic modeling. For text classification, we propose a merit-fusion-based framework incorporating several Large Language Models (LLMs) where different weight selection and optimization methods are employed to assign weights to the LLMs. In topic modeling, we employed the BERTopic library to discover the hidden topic patterns in the water-related tweets. We also analyzed relevant tweets originating from different regions and countries to explore global, regional, and country-specific issues and water-related concerns. We also collected and manually annotated a large-scale dataset, which is expected to facilitate future research on the topic.

4/24/2024

A Toolbox for Supporting Research on AI in Water Distribution Networks

Andr'e Artelt, Marios S. Kyriakou, Stelios G. Vrachimis, Demetrios G. Eliades, Barbara Hammer, Marios M. Polycarpou

Drinking water is a vital resource for humanity, and thus, Water Distribution Networks (WDNs) are considered critical infrastructures in modern societies. The operation of WDNs is subject to diverse challenges such as water leakages and contamination, cyber/physical attacks, high energy consumption during pump operation, etc. With model-based methods reaching their limits due to various uncertainty sources, AI methods offer promising solutions to those challenges. In this work, we introduce a Python toolbox for complex scenario modeling & generation such that AI researchers can easily access challenging problems from the drinking water domain. Besides providing a high-level interface for the easy generation of hydraulic and water quality scenario data, it also provides easy access to popular event detection benchmarks and an environment for developing control algorithms.

6/5/2024

A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Ayaz Mehmood, Muhammad Tayyab Zamir, Muhammad Asif Ayub, Nasir Ahmad, Kashif Ahmad

Over the last decade, similar to other application domains, social media content has been proven very effective in disaster informatics. However, due to the unstructured nature of the data, several challenges are associated with disaster analysis in social media content. To fully explore the potential of social media content in disaster informatics, access to relevant content and the correct geo-location information is very critical. In this paper, we propose a three-step solution to tackling these challenges. Firstly, the proposed solution aims to classify social media posts into relevant and irrelevant posts followed by the automatic extraction of location information from the posts' text through Named Entity Recognition (NER) analysis. Finally, to quickly analyze the topics covered in large volumes of social media posts, we perform topic modeling resulting in a list of top keywords, that highlight the issues discussed in the tweet. For the Relevant Classification of Twitter Posts (RCTP), we proposed a merit-based fusion framework combining the capabilities of four different models namely BERT, RoBERTa, Distil BERT, and ALBERT obtaining the highest F1-score of 0.933 on a benchmark dataset. For the Location Extraction from Twitter Text (LETT), we evaluated four models namely BERT, RoBERTa, Distil BERTA, and Electra in an NER framework obtaining the highest F1-score of 0.960. For topic modeling, we used the BERTopic library to discover the hidden topic patterns in the relevant tweets. The experimental results of all the components of the proposed end-to-end solution are very encouraging and hint at the potential of social media content and NLP in disaster management.

5/3/2024

The Call for Socially Aware Language Technologies

Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank

Language technologies have made enormous progress, especially with the introduction of large language models (LLMs). On traditional tasks such as machine translation and sentiment analysis, these models perform at near-human level. These advances can, however, exacerbate a variety of issues that models have traditionally struggled with, such as bias, evaluation, and risks. In this position paper, we argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates, which we call social awareness. While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users. Integrating social awareness into NLP models will make applications more natural, helpful, and safe, and will open up new possibilities. Thus we argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.

5/7/2024