CryptoGPT: a 7B model rivaling GPT-4 in the task of analyzing and classifying real-time financial news

Read original: arXiv:2406.14039 - Published 6/21/2024 by Ying Zhang (BH), Matthieu Petit Guillaume (BH), Aur'elien Krauth (ON), Manel Labidi

📈

Overview

Presents a method for refining a dedicated large language model (LLM) with limited resources in an industrial setting
The model, CryptoGPT, is designed for real-time financial news analysis in the cryptocurrency market
Focuses on balancing data protection, annotation cost, model size, and analysis quality

Plain English Explanation

This paper describes a way to improve a relatively capable language model for a specific task - analyzing financial news about cryptocurrencies. The researchers wanted to create a model that could do this well, but without having to transfer sensitive data to external servers, spend a lot of time and money annotating data, or end up with an overly large and costly model.

To do this, they took existing language models like Mistral-7B and LLama-7B and refined them using a combination of semi-automatic annotation and strategic fine-tuning. This allowed them to enhance the model's capabilities for the specific task of analyzing cryptocurrency-related financial news, without having to build everything from scratch or transfer data to external servers.

The goal was to find a balance between protecting sensitive data, keeping annotation costs and time under control, managing the model's size for deployment, and still maintaining high-quality analysis. By refining existing models rather than training from the ground up, the researchers were able to meet these different needs more effectively.

Technical Explanation

The researchers took existing large language models (LLMs) of a similar size, such as Mistral-7B and LLama-7B, and refined them using a combination of semi-automatic annotation and strategic fine-tuning. This allowed them to create a dedicated model, called CryptoGPT, designed for real-time financial news analysis in the cryptocurrency market.

The semi-automatic annotation process helped to limit the cost and time required for data labeling, while still ensuring the quality of the training data. The team then fine-tuned the models using this annotated data, focusing on the specific task of classifying and analyzing financial information related to cryptocurrencies.

By refining existing models rather than training from scratch, the researchers were able to balance several key requirements:

Protecting sensitive data by avoiding the need to transfer it to external servers
Limiting the time and cost of the annotation process
Controlling the final model size to manage deployment costs
Maintaining high-quality analysis capabilities for the target task

The performance of the refined CryptoGPT model was compared to various other LLMs, including GPT-3.5 and GPT-4, demonstrating its competitive capabilities for the specific financial news analysis task.

Critical Analysis

The paper provides a thoughtful approach to refining an LLM for a targeted industrial application while addressing practical constraints. The use of semi-automatic annotation and strategic fine-tuning is a reasonable compromise between manual labeling and fully unsupervised learning.

However, the paper could have provided more details on the specific techniques used for the semi-automatic annotation and fine-tuning processes. Additionally, the researchers could have explored the potential trade-offs or limitations of this approach, such as the impact on model generalization or the risk of overfitting to the target domain.

It would also be interesting to see how the CryptoGPT model performs on a broader range of financial tasks beyond just news analysis, or how it compares to models fine-tuned for enhanced sentiment analysis or advanced data interaction. Further research in these areas could help validate the broader applicability of the refined model.

Conclusion

This paper presents a pragmatic approach to refining a large language model for a specific industrial use case - real-time financial news analysis in the cryptocurrency market. By leveraging semi-automatic annotation and strategic fine-tuning, the researchers were able to create a capable model, CryptoGPT, while addressing practical constraints around data protection, annotation cost, and model size.

The results demonstrate the potential for customizing LLMs to meet the needs of particular applications, even with limited resources. This type of targeted refinement could be valuable in other domains where off-the-shelf models may not fully meet the requirements of a specific task or industry.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

CryptoGPT: a 7B model rivaling GPT-4 in the task of analyzing and classifying real-time financial news

Ying Zhang (BH), Matthieu Petit Guillaume (BH), Aur'elien Krauth (ON), Manel Labidi

CryptoGPT: a 7B model competing with GPT-4 in a specific task -- The Impact of Automatic Annotation and Strategic Fine-Tuning via QLoRAIn this article, we present a method aimed at refining a dedicated LLM of reasonable quality with limited resources in an industrial setting via CryptoGPT. It is an LLM designed for financial news analysis for the cryptocurrency market in real-time. This project was launched in an industrial context. This model allows not only for the classification of financial information but also for providing comprehensive analysis. We refined different LLMs of the same size such as Mistral-7B and LLama-7B using semi-automatic annotation and compared them with various LLMs such as GPT-3.5 and GPT-4. Our goal is to find a balance among several needs: 1. Protecting data (by avoiding their transfer to external servers), 2. Limiting annotation cost and time, 3. Controlling the model's size (to manage deployment costs), and 4. Maintaining better analysis quality.

6/21/2024

🔄

BERT vs GPT for financial engineering

Edward Sharkey, Philip Treleaven

The paper benchmarks several Transformer models [4], to show how these models can judge sentiment from a news event. This signal can then be used for downstream modelling and signal identification for commodity trading. We find that fine-tuned BERT models outperform fine-tuned or vanilla GPT models on this task. Transformer models have revolutionized the field of natural language processing (NLP) in recent years, achieving state-of-the-art results on various tasks such as machine translation, text summarization, question answering, and natural language generation. Among the most prominent transformer models are Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT), which differ in their architectures and objectives. A CopBERT model training data and process overview is provided. The CopBERT model outperforms similar domain specific BERT trained models such as FinBERT. The below confusion matrices show the performance on CopBERT & CopGPT respectively. We see a ~10 percent increase in f1_score when compare CopBERT vs GPT4 and 16 percent increase vs CopGPT. Whilst GPT4 is dominant It highlights the importance of considering alternatives to GPT models for financial engineering tasks, given risks of hallucinations, and challenges with interpretability. We unsurprisingly see the larger LLMs outperform the BERT models, with predictive power. In summary BERT is partially the new XGboost, what it lacks in predictive power it provides with higher levels of interpretability. Concluding that BERT models might not be the next XGboost [2], but represent an interesting alternative for financial engineering tasks, that require a blend of interpretability and accuracy.

5/24/2024

🤖

Generative AI for automatic topic labelling

Diego Kozlowski, Carolina Pradier, Pierre Benz

Topic Modeling has become a prominent tool for the study of scientific fields, as they allow for a large scale interpretation of research trends. Nevertheless, the output of these models is structured as a list of keywords which requires a manual interpretation for the labelling. This paper proposes to assess the reliability of three LLMs, namely flan, GPT-4o, and GPT-4 mini for topic labelling. Drawing on previous research leveraging BERTopic, we generate topics from a dataset of all the scientific articles (n=34,797) authored by all biology professors in Switzerland (n=465) between 2008 and 2020, as recorded in the Web of Science database. We assess the output of the three models both quantitatively and qualitatively and find that, first, both GPT models are capable of accurately and precisely label topics from the models' output keywords. Second, 3-word labels are preferable to grasp the complexity of research topics.

8/14/2024

Fine-Tuned 'Small' LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification

Martin Juan Jos'e Bucher, Marco Martini

Generative AI offers a simple, prompt-based alternative to fine-tuning smaller BERT-style LLMs for text classification tasks. This promises to eliminate the need for manually labeled training data and task-specific model training. However, it remains an open question whether tools like ChatGPT can deliver on this promise. In this paper, we show that smaller, fine-tuned LLMs (still) consistently and significantly outperform larger, zero-shot prompted models in text classification. We compare three major generative AI models (ChatGPT with GPT-3.5/GPT-4 and Claude Opus) with several fine-tuned LLMs across a diverse set of classification tasks (sentiment, approval/disapproval, emotions, party positions) and text categories (news, tweets, speeches). We find that fine-tuning with application-specific training data achieves superior performance in all cases. To make this approach more accessible to a broader audience, we provide an easy-to-use toolkit alongside this paper. Our toolkit, accompanied by non-technical step-by-step guidance, enables users to select and fine-tune BERT-like LLMs for any classification task with minimal technical and computational effort.

8/19/2024