Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction

Read original: arXiv:2405.10625 - Published 5/20/2024 by Jiayun Pang, Ivan Vuli'c

Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction

Overview

This paper explores the use of large language models (LLMs) for targeted sentiment analysis.
It compares the performance of LLMs to human raters on a sentiment analysis task and investigates the impact of model architecture and scale.
The paper also includes a study on zero-shot and few-shot learning using instruction-finetuned LLMs.
Additionally, the research examines the role of model architecture and scale in predicting molecular properties.
The paper explores key ingredients for effective zero-shot cross-lingual transfer learning with LLMs.
Finally, the research investigates the effectiveness of frozen transformers as language models for visual tasks.

Plain English Explanation

The researchers in this paper looked at how well large language models (LLMs) can analyze the sentiment, or feelings, expressed in text. They compared the performance of LLMs to human raters on this sentiment analysis task and investigated how the model's architecture (structure) and size (scale) affected the results.

The paper also included a study on something called "zero-shot" and "few-shot" learning using LLMs that had been trained on specific instructions. This means the models were able to perform tasks they hadn't been explicitly trained on, just by following the instructions.

Another part of the research explored how the design and size of LLMs impacted their ability to predict the properties of molecules, which could be useful for drug discovery and other chemistry applications.

The researchers also looked at the key factors that make zero-shot cross-lingual (between languages) learning effective with LLMs. This could help language models work well across different languages without needing to be trained on each one separately.

Finally, the paper investigated whether frozen (unchanging) transformer language models could be used effectively for visual tasks, like analyzing images, in addition to their traditional use for text-based tasks.

Technical Explanation

The paper first examines the use of large language models (LLMs) for targeted sentiment analysis. The researchers compared the performance of LLMs to human raters on a sentiment analysis task and explored the impact of model architecture and scale on the results.

Next, the paper includes a zero-shot and few-shot learning study using instruction-finetuned LLMs. This investigates the ability of LLMs trained on specific instructions to perform tasks they haven't been explicitly trained on.

The research then looks at the role of model architecture and scale in predicting molecular properties, which could be useful for applications like drug discovery.

The paper also explores the key ingredients for effective zero-shot cross-lingual transfer learning with LLMs. This examines the factors that enable language models to work well across different languages without extensive training on each one.

Finally, the research investigates the effectiveness of frozen transformers as language models for visual tasks. This looks at whether language models can be used for image analysis in addition to their traditional text-based applications.

Critical Analysis

The paper provides a comprehensive exploration of the capabilities and limitations of large language models across a variety of tasks and domains. The researchers acknowledge that while LLMs have shown impressive performance, there are still many open questions and areas for further investigation.

For example, the paper notes that the zero-shot and few-shot learning results using instruction-finetuned LLMs are promising but may be sensitive to the specific instructions provided. Additional research is needed to understand the factors that contribute to the success of this approach.

Similarly, the findings on the role of model architecture and scale in predicting molecular properties are valuable, but the researchers caution that the models may struggle with rare or complex molecules that are not well represented in the training data.

When it comes to zero-shot cross-lingual transfer learning, the paper highlights the importance of factors like task similarity and lexical overlap between languages. However, the researchers acknowledge that these findings may not generalize to all language pairs and task domains.

Finally, while the investigation of frozen transformers for visual tasks is intriguing, the paper recognizes that these models may have limitations in capturing the nuanced visual features required for more complex image analysis tasks.

Overall, the paper offers a thoughtful and critical examination of these important topics in the field of large language models, providing a valuable resource for researchers and practitioners working in this rapidly evolving area of AI.

Conclusion

This paper presents a comprehensive exploration of the use of large language models (LLMs) across a range of applications, including sentiment analysis, zero-shot and few-shot learning, molecular property prediction, cross-lingual transfer, and visual tasks.

The research demonstrates the impressive capabilities of LLMs, but also highlights the need for further investigation into the factors that contribute to their success and the limitations of the current approaches. The findings provide valuable insights for researchers and practitioners working to advance the state-of-the-art in natural language processing and machine learning.

As the field of LLMs continues to evolve, this paper serves as an important reference point, underscoring the need for critical analysis and further exploration to fully realize the potential of these powerful models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction

Jiayun Pang, Ivan Vuli'c

Transformer-based encoder-decoder models have demonstrated impressive results in chemical reaction prediction tasks. However, these models typically rely on pretraining using tens of millions of unlabelled molecules, which can be time-consuming and GPU-intensive. One of the central questions we aim to answer in this work is: Can FlanT5 and ByT5, the encode-decoder models pretrained solely on language data, be effectively specialised for organic reaction prediction through task-specific fine-tuning? We conduct a systematic empirical study on several key issues of the process, including tokenisation, the impact of (SMILES-oriented) pretraining, fine-tuning sample efficiency, and decoding algorithms at inference. Our key findings indicate that although being pretrained only on language tasks, FlanT5 and ByT5 provide a solid foundation to fine-tune for reaction prediction, and thus become `chemistry domain compatible' in the process. This suggests that GPU-intensive and expensive pretraining on a large dataset of unlabelled molecules may be useful yet not essential to leverage the power of language models for chemistry. All our models achieve comparable Top-1 and Top-5 accuracy although some variation across different models does exist. Notably, tokenisation and vocabulary trimming slightly affect final performance but can speed up training and inference; The most efficient greedy decoding strategy is very competitive while only marginal gains can be achieved from more sophisticated decoding algorithms. In summary, we evaluate FlanT5 and ByT5 across several dimensions and benchmark their impact on organic reaction prediction, which may guide more effective use of these state-of-the-art language models for chemistry-related tasks in the future.

5/20/2024

A Large Encoder-Decoder Family of Foundation Models For Chemical Language

Eduardo Soares, Victor Shirasuna, Emilio Vital Brazil, Renato Cerqueira, Dmitry Zubarev, Kristin Schmidt

Large-scale pre-training methodologies for chemical language models represent a breakthrough in cheminformatics. These methods excel in tasks such as property prediction and molecule generation by learning contextualized representations of input tokens through self-supervised learning on large unlabeled corpora. Typically, this involves pre-training on unlabeled data followed by fine-tuning on specific tasks, reducing dependence on annotated datasets and broadening chemical language representation understanding. This paper introduces a large encoder-decoder chemical foundation models pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, which is equivalent to 4 billion of molecular tokens. The proposed foundation model supports different complex tasks, including quantum property prediction, and offer flexibility with two main variants (289M and $8times289M$). Our experiments across multiple benchmark datasets validate the capacity of the proposed model in providing state-of-the-art results for different tasks. We also provide a preliminary assessment of the compositionality of the embedding space as a prerequisite for the reasoning tasks. We demonstrate that the produced latent space is separable compared to the state-of-the-art with few-shot learning capabilities.

7/31/2024

💬

Large Language Models in Targeted Sentiment Analysis

Nicolay Rusnachenko, Anton Golubev, Natalia Loukachevitch

In this paper we investigate the use of decoder-based generative transformers for extracting sentiment towards the named entities in Russian news articles. We study sentiment analysis capabilities of instruction-tuned large language models (LLMs). We consider the dataset of RuSentNE-2023 in our study. The first group of experiments was aimed at the evaluation of zero-shot capabilities of LLMs with closed and open transparencies. The second covers the fine-tuning of Flan-T5 using the chain-of-thought (CoT) three-hop reasoning framework (THoR). We found that the results of the zero-shot approaches are similar to the results achieved by baseline fine-tuned encoder-based transformers (BERT-base). Reasoning capabilities of the fine-tuned Flan-T5 models with THoR achieve at least 5% increment with the base-size model compared to the results of the zero-shot experiment. The best results of sentiment analysis on RuSentNE-2023 were achieved by fine-tuned Flan-T5-xl, which surpassed the results of previous state-of-the-art transformer-based classifiers. Our CoT application framework is publicly available: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework

4/19/2024

Accelerating the inference of string generation-based chemical reaction models for industrial applications

Mikhail Andronov, Natalia Andronova, Michael Wand, Jurgen Schmidhuber, Djork-Arn'e Clevert

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.

7/18/2024