Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation

2404.01338

Published 4/3/2024 by Silvia Garc'ia-M'endez, Francisco de Arriba-P'erez, Ana Barros-Vila, Francisco J. Gonz'alez-Casta~no, Enrique Costa-Montenegro

cs.CL cs.CE cs.IR cs.LG

🔎

Abstract

Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (NLP) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (LDA) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. We created an experimental data set composed of 2,158 financial news items that were manually labelled by NLP researchers to evaluate our solution. The ROUGE-L values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with LDA to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text.

Get summaries of the top AI research delivered straight to your inbox:

Overview

Financial news is an unstructured source of information that can be mined for market insights, but manually extracting relevant information is challenging for many investors.
The researchers propose a novel Natural Language Processing (NLP) system to assist investors in detecting relevant financial events and predictions/forecasts in unstructured news texts.
The system combines text segmentation, co-reference resolution, topic modeling, and temporal analysis to identify relevant content and forecasts/predictions.
The researchers evaluated their system on a dataset of 2,158 manually labeled financial news items, achieving high performance.

Plain English Explanation

Financial news articles contain lots of useful information that investors could use to make better decisions. However, sifting through all the news to find the truly relevant bits and identifying forecasts or predictions is a daunting task, even for experienced investors. It's like trying to find a needle in a haystack - there's just so much information to wade through.

The researchers developed a smart computer system to help with this challenge. The system uses advanced natural language processing techniques to automatically analyze financial news articles. First, it breaks the articles down into smaller, related chunks of text. Then it figures out how the different parts of each article are connected. Next, it identifies the most important topics covered in the relevant parts of the articles. Finally, it looks for sentences that contain predictions or forecasts about the financial markets.

By automating these steps, the system can quickly parse through a large volume of financial news and surface the key insights that investors would find most useful. It's kind of like having a super-smart assistant that can read through all the news for you and highlight the important bits. This could be a game-changer for investors who want to stay on top of the markets but don't have the time or resources to manually review everything.

Technical Explanation

The researchers' novel NLP system consists of several key components:

Text Segmentation: The system first segments the news articles into topically cohesive units, grouping together closely related sentences and paragraphs.
Co-reference Resolution: Next, it applies co-reference resolution to identify internal dependencies within each text segment, such as how different pronouns and references relate to the same entities.
Relevance Detection: The system then uses Latent Dirichlet Allocation (LDA) topic modeling to separate relevant content from less relevant content within each text segment.
Temporal Analysis: Finally, the system analyzes the relevant text segments using a machine learning-based approach to identify predictions, forecasts, and other speculative statements.

The researchers evaluated this end-to-end system on a dataset of 2,158 manually labeled financial news articles. The system achieved strong performance, with a ROUGE-L score of 0.662 for relevance detection and 0.982 for prediction/forecast identification.

Critical Analysis

The researchers acknowledge that their dataset, while substantial, may not be fully representative of the diversity of financial news sources and styles. Expanding the dataset and evaluating the system's performance across a wider range of news outlets could help validate the generalizability of the findings.

Additionally, the temporal analysis component of the system focuses on identifying predictions and forecasts, but does not attempt to assess their accuracy or reliability. Incorporating mechanisms to evaluate the credibility of the identified forecasts could further enhance the system's usefulness for investors.

It would also be interesting to explore how the system's outputs could be integrated into investment decision-making workflows. Understanding the best ways to present the extracted insights to investors in a meaningful and actionable manner is an important area for future research.

Conclusion

This novel NLP system represents an important step forward in automating the extraction of relevant financial insights from unstructured news sources. By combining advanced text processing techniques, the system can efficiently sift through large volumes of financial news and surface the most valuable information for investors, including forecasts and predictions.

While further research is needed to refine and expand the system's capabilities, this work demonstrates the potential for AI-powered tools to augment human decision-making in the financial domain. As the volume and complexity of financial information continues to grow, solutions like this could become increasingly vital for investors seeking to stay ahead of the curve.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Detection of Temporality at Discourse Level on Financial News by Combining Natural Language Processing and Machine Learning

Silvia Garc'ia-M'endez, Francisco de Arriba-P'erez, Ana Barros-Vila, Francisco J. Gonz'alez-Casta~no

Finance-related news such as Bloomberg News, CNN Business and Forbes are valuable sources of real data for market screening systems. In news, an expert shares opinions beyond plain technical analyses that include context such as political, sociological and cultural factors. In the same text, the expert often discusses the performance of different assets. Some key statements are mere descriptions of past events while others are predictions. Therefore, understanding the temporality of the key statements in a text is essential to separate context information from valuable predictions. We propose a novel system to detect the temporality of finance-related news at discourse level that combines Natural Language Processing and Machine Learning techniques, and exploits sophisticated features such as syntactic and semantic dependencies. More specifically, we seek to extract the dominant tenses of the main statements, which may be either explicit or implicit. We have tested our system on a labelled dataset of finance-related news annotated by researchers with knowledge in the field. Experimental results reveal a high detection precision compared to an alternative rule-based baseline approach. Ultimately, this research contributes to the state-of-the-art of market screening by identifying predictive knowledge for financial decision making.

4/3/2024

cs.CL cs.CE cs.IR cs.LG

BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights

Enmin Zhu, Jerome Yen

This paper explores the intersection of Natural Language Processing (NLP) and financial analysis, focusing on the impact of sentiment analysis in stock price prediction. We employ BERTopic, an advanced NLP technique, to analyze the sentiment of topics derived from stock market comments. Our methodology integrates this sentiment analysis with various deep learning models, renowned for their effectiveness in time series and stock prediction tasks. Through comprehensive experiments, we demonstrate that incorporating topic sentiment notably enhances the performance of these models. The results indicate that topics in stock market comments provide implicit, valuable insights into stock market volatility and price trends. This study contributes to the field by showcasing the potential of NLP in enriching financial analysis and opens up avenues for further research into real-time sentiment analysis and the exploration of emotional and contextual aspects of market sentiment. The integration of advanced NLP techniques like BERTopic with traditional financial analysis methods marks a step forward in developing more sophisticated tools for understanding and predicting market behaviors.

4/5/2024

cs.CL cs.CE

Enhancing Traffic Prediction with Textual Data Using Large Language Models

Xiannan Huang

Traffic prediction is pivotal for rational transportation supply scheduling and allocation. Existing researches into short-term traffic prediction, however, face challenges in adequately addressing exceptional circumstances and integrating non-numerical contextual information like weather into models. While, Large language models offer a promising solution due to their inherent world knowledge. However, directly using them for traffic prediction presents drawbacks such as high cost, lack of determinism, and limited mathematical capability. To mitigate these issues, this study proposes a novel approach. Instead of directly employing large models for prediction, it utilizes them to process textual information and obtain embeddings. These embeddings are then combined with historical traffic data and inputted into traditional spatiotemporal forecasting models. The study investigates two types of special scenarios: regional-level and node-level. For regional-level scenarios, textual information is represented as a node connected to the entire network. For node-level scenarios, embeddings from the large model represent additional nodes connected only to corresponding nodes. This approach shows a significant improvement in prediction accuracy according to our experiment of New York Bike dataset.

5/14/2024

cs.CL cs.AI

RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data

Yupeng Cao, Zhi Chen, Qingyun Pei, Fabrizio Dimino, Lorenzo Ausiello, Prashant Kumar, K. P. Subbalakshmi, Papa Momar Ndiaye

The integration of Artificial Intelligence (AI) techniques, particularly large language models (LLMs), in finance has garnered increasing academic attention. Despite progress, existing studies predominantly focus on tasks like financial text summarization, question-answering (Q$&$A), and stock movement prediction (binary classification), with a notable gap in the application of LLMs for financial risk prediction. Addressing this gap, in this paper, we introduce textbf{RiskLabs}, a novel framework that leverages LLMs to analyze and predict financial risks. RiskLabs uniquely combines different types of financial data, including textual and vocal information from Earnings Conference Calls (ECCs), market-related time series data, and contextual news data surrounding ECC release dates. Our approach involves a multi-stage process: initially extracting and analyzing ECC data using LLMs, followed by gathering and processing time-series data before the ECC dates to model and understand risk over different timeframes. Using multimodal fusion techniques, RiskLabs amalgamates these varied data features for comprehensive multi-task financial risk prediction. Empirical experiment results demonstrate RiskLab's effectiveness in forecasting both volatility and variance in financial markets. Through comparative experiments, we demonstrate how different data sources contribute to financial risk assessment and discuss the critical role of LLMs in this context. Our findings not only contribute to the AI in finance application but also open new avenues for applying LLMs in financial risk assessment.

4/12/2024

cs.AI cs.CE cs.LG