A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Read original: arXiv:2405.00903 - Published 5/3/2024 by Ayaz Mehmood, Muhammad Tayyab Zamir, Muhammad Asif Ayub, Nasir Ahmad, Kashif Ahmad

A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Overview

The paper presents a solution for locating and better assessing natural disasters in social media using named entity recognition and topic modeling.
The proposed approach aims to improve disaster response and mitigation by extracting relevant information from social media posts.
The solution involves named entity recognition to identify location-based entities and topic modeling to categorize posts into relevant disaster-related topics.

Plain English Explanation

The paper outlines a method to more effectively leverage social media data during natural disasters. The key idea is to use advanced language processing techniques to extract useful information from social media posts. This builds on prior work on using artificial intelligence for disaster response and monitoring critical infrastructure during crises.

The first step is named entity recognition, which identifies references to specific locations, people, organizations, and other relevant entities within the text of social media posts. This allows the system to pinpoint where disaster-related events are occurring based on the mentions of places in the posts.

The second step is topic modeling, which groups the social media posts into coherent themes or topics. This helps classify the posts into categories like damage reports, rescue requests, supply needs, etc. Categorizing the posts in this way provides a more structured understanding of the evolving situation on the ground during a disaster.

By combining these two techniques - identifying locations and classifying post content - the researchers aim to create a more comprehensive and actionable picture of unfolding natural disasters based on real-time social media data. This builds on prior work on using natural language processing for fine-grained entity extraction in COVID-19 news.

The goal is to equip disaster response agencies and the public with a more detailed, up-to-date understanding of the situation to inform their decisions and actions. This could lead to faster, more effective disaster relief and recovery efforts.

Technical Explanation

The paper presents a two-stage approach combining named entity recognition and topic modeling to extract valuable information from social media posts during natural disasters.

In the first stage, the researchers use a sequence-to-sequence neural network model for named entity recognition. This identifies references to locations, organizations, people, and other relevant entities within the text of social media posts. By pinpointing the geographic locations mentioned, this allows the system to map where disaster-related events are unfolding.

In the second stage, the researchers apply latent dirichlet allocation (LDA), a popular topic modeling technique, to group the social media posts into coherent thematic clusters. This categorizes the posts into topics like damage reports, rescue requests, supply needs, etc. Understanding the content and focus of the posts in this way provides greater context about the evolving situation on the ground.

By integrating these two complementary techniques - location extraction and content categorization - the proposed solution aims to provide a more comprehensive and actionable understanding of natural disasters as they unfold in real-time on social media. This builds on prior work on using AI for intent detection and entity extraction from biomedical literature.

The researchers evaluate their approach on social media data related to several major natural disasters, demonstrating its effectiveness in accurately locating disaster events and assessing the situation compared to baseline methods.

Critical Analysis

The paper presents a well-designed and thorough approach to leveraging social media data for natural disaster monitoring and response. The combination of named entity recognition and topic modeling is a novel and promising solution that addresses key challenges in this domain.

However, the authors acknowledge some limitations of their work. First, the named entity recognition model is trained on a general-purpose dataset and may not perform as well on the specialized vocabulary and writing styles often found in social media posts. Further fine-tuning or domain adaptation of the model could potentially improve its accuracy.

Additionally, the topic modeling component relies on the subjective interpretation of the resulting topics. While the researchers demonstrate the coherence and distinctiveness of the topics, there may be room for more automated or objective methods of classifying the post content.

Another potential area for improvement is the handling of multimodal social media data, such as images and videos, which could provide valuable additional context about disaster situations. Integrating computer vision techniques into the solution could enhance its overall effectiveness.

Finally, the authors note that their evaluation was conducted on a relatively small dataset of disaster-related social media posts. Scaling up the solution to handle the massive volume of social media data generated during major disasters would require further engineering and optimization efforts.

Overall, the paper presents a compelling and innovative approach to natural disaster monitoring and response using advanced natural language processing techniques. With continued refinement and expansion, this line of research has the potential to significantly improve disaster relief efforts and save lives.

Conclusion

The paper introduces a novel solution for locating and assessing natural disasters in real-time using named entity recognition and topic modeling on social media data. By identifying relevant geographic locations and categorizing post content, the proposed approach aims to provide disaster response agencies and the public with a more comprehensive, up-to-date understanding of evolving disaster situations.

While the paper demonstrates the effectiveness of the solution, it also highlights areas for potential improvement, such as model fine-tuning, more objective topic classification, and integration of multimodal data. Addressing these challenges could further enhance the utility of this technology for disaster response and mitigation.

Overall, this research represents an important step forward in leveraging the wealth of information available on social media to improve our ability to prepare for, respond to, and recover from natural disasters. As the frequency and severity of these events continues to grow, solutions like the one presented in this paper will become increasingly crucial for saving lives and minimizing the impact of catastrophic events.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

Ayaz Mehmood, Muhammad Tayyab Zamir, Muhammad Asif Ayub, Nasir Ahmad, Kashif Ahmad

Over the last decade, similar to other application domains, social media content has been proven very effective in disaster informatics. However, due to the unstructured nature of the data, several challenges are associated with disaster analysis in social media content. To fully explore the potential of social media content in disaster informatics, access to relevant content and the correct geo-location information is very critical. In this paper, we propose a three-step solution to tackling these challenges. Firstly, the proposed solution aims to classify social media posts into relevant and irrelevant posts followed by the automatic extraction of location information from the posts' text through Named Entity Recognition (NER) analysis. Finally, to quickly analyze the topics covered in large volumes of social media posts, we perform topic modeling resulting in a list of top keywords, that highlight the issues discussed in the tweet. For the Relevant Classification of Twitter Posts (RCTP), we proposed a merit-based fusion framework combining the capabilities of four different models namely BERT, RoBERTa, Distil BERT, and ALBERT obtaining the highest F1-score of 0.933 on a benchmark dataset. For the Location Extraction from Twitter Text (LETT), we evaluated four models namely BERT, RoBERTa, Distil BERTA, and Electra in an NER framework obtaining the highest F1-score of 0.960. For topic modeling, we used the BERTopic library to discover the hidden topic patterns in the relevant tweets. The experimental results of all the components of the proposed end-to-end solution are very encouraging and hint at the potential of social media content and NLP in disaster management.

5/3/2024

Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-Tuning

David Hanny, Sebastian Schmidt, Bernd Resch

Information from social media can provide essential information for emergency response during natural disasters in near real-time. However, it is difficult to identify the disaster-related posts among the large amounts of unstructured data available. Previous methods often use keyword filtering, topic modelling or classification-based techniques to identify such posts. Active Learning (AL) presents a promising sub-field of Machine Learning (ML) that has not been used much in the field of text classification of social media content. This study therefore investigates the potential of AL for identifying disaster-related Tweets. We compare a keyword filtering approach, a RoBERTa model fine-tuned with generic data from CrisisLex, a base RoBERTa model trained with AL and a fine-tuned RoBERTa model trained with AL regarding classification performance. For testing, data from CrisisLex and manually labelled data from the 2021 flood in Germany and the 2023 Chile forest fires were considered. The results show that generic fine-tuning combined with 10 rounds of AL outperformed all other approaches. Consequently, a broadly applicable model for the identification of disaster-related Tweets could be trained with very little labelling effort. The model can be applied to use cases beyond this study and provides a useful tool for further research in social media analysis.

8/20/2024

🏷️

QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment

Jin Han, Zhe Zheng, Xin-Zheng Lu, Ke-Yin Chen, Jia-Rui Lin

Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering their relationship to the physical and social impacts of earthquakes, and a dataset comprising 7282 earthquake-related microblogs from twenty earthquakes in different locations is developed as well. Then, with a systematic analysis of various influential factors, QuakeBERT, a domain-specific large language model (LLM), is developed and fine-tuned for accurate classification and filtering of microblogs. Meanwhile, an integrated method integrating public opinion trend analysis, sentiment analysis, and keyword-based physical impact quantification is introduced to assess both the physical and social impacts of earthquakes based on social media texts. Experiments show that data diversity and data volume dominate the performance of QuakeBERT and increase the macro average F1 score by 27%, while the best classification model QuakeBERT outperforms the CNN- or RNN-based models by improving the macro average F1 score from 60.87% to 84.33%. Finally, the proposed approach is applied to assess two earthquakes with the same magnitude and focal depth. Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs, which enables effective post-disaster emergency responses to create more resilient cities.

5/14/2024

💬

CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics

Kai Yin, Chengkai Liu, Ali Mostafavi, Xia Hu

In the field of crisis/disaster informatics, social media is increasingly being used for improving situational awareness to inform response and relief efforts. Efficient and accurate text classification tools have been a focal area of investigation in crisis informatics. However, current methods mostly rely on single-label text classification models, which fails to capture different insights embedded in dynamic and multifaceted disaster-related social media data. This study introduces a novel approach to disaster text classification by enhancing a pre-trained Large Language Model (LLM) through instruction fine-tuning targeted for multi-label classification of disaster-related tweets. Our methodology involves creating a comprehensive instruction dataset from disaster-related tweets, which is then used to fine-tune an open-source LLM, thereby embedding it with disaster-specific knowledge. This fine-tuned model can classify multiple aspects of disaster-related information simultaneously, such as the type of event, informativeness, and involvement of human aid, significantly improving the utility of social media data for situational awareness in disasters. The results demonstrate that this approach enhances the categorization of critical information from social media posts, thereby facilitating a more effective deployment for situational awareness during emergencies. This research paves the way for more advanced, adaptable, and robust disaster management tools, leveraging the capabilities of LLMs to improve real-time situational awareness and response strategies in disaster scenarios.

6/26/2024