Investigating the translation capabilities of Large Language Models trained on parallel data only

2406.09140

Published 6/14/2024 by Javier Garc'ia Gilabert, Carlos Escolano, Aleix Sant Savall, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

cs.CL

💬

Abstract

In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space.

Create account to get full access

Overview

Large Language Models (LLMs) have excelled at a variety of Natural Language Processing (NLP) tasks, including Machine Translation.
Previous methods relied on iterative processes like instruction fine-tuning or continual pre-training, leaving the challenges of training LLMs solely on parallel data unexplored.
This work introduces PLUME (Parallel Language Model), a collection of three 2B LLMs with varying vocabulary sizes, trained exclusively on Catalan-centric parallel examples.
These models perform comparably to previous encoder-decoder architectures on both supervised and zero-shot translation tasks.
The researchers conduct a thorough investigation into the translation capabilities of LLMs, examining their performance, the impact of prompt elements, and their cross-lingual representation space.

Plain English Explanation

Large language models have become incredibly skilled at a wide range of language-related tasks, including translating between languages. However, previous approaches to training these models for translation often involved complex, iterative processes like fine-tuning or continual pre-training.

In this new research, the authors introduce a set of large language models called PLUME, which were trained solely on parallel data - text that has been translated into multiple languages. These PLUME models, ranging in vocabulary size from 32,000 to 256,000 words, were able to perform just as well as previous specialized translation models on a variety of translation tasks, both when translating between languages they were trained on and when trying to translate between languages they weren't explicitly trained on.

By studying these PLUME models in-depth, the researchers were able to gain insights into how large language models handle translation and multilingual capabilities. They looked at factors like the specific prompts used, and how the models represent linguistic concepts across different languages.

Technical Explanation

The researchers introduce PLUME (Parallel Language Model), a collection of three 2 billion parameter LLMs with vocabulary sizes of 32k, 128k, and 256k. These models were trained exclusively on parallel data - text that has been translated between Catalan and other languages.

Despite this specialized training approach, the PLUME models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. The researchers conduct a thorough investigation into the translation capabilities of these LLMs, probing their performance, the impact of prompt elements, and their cross-lingual representation space.

This work represents a novel paradigm for boosting the translation capabilities of large language models, moving away from the previously dominant iterative fine-tuning or continual pre-training approaches.

Critical Analysis

The paper provides a thorough exploration of training large language models solely on parallel data for translation tasks. While the results demonstrate the impressive capabilities of this approach, there are a few caveats to consider:

The models were trained on a relatively narrow set of Catalan-centric language pairs, so their performance may not generalize as well to more diverse language combinations.
The researchers note that the models' cross-lingual representation space could be further improved, suggesting opportunities for continued research in this area.
As with many large language models, there are open questions around the interpretability and potential biases present in the PLUME models.

Overall, this work represents an interesting and promising direction for enhancing the translation abilities of large language models. However, further research is needed to fully understand the strengths, limitations, and broader implications of this approach.

Conclusion

This research introduces a novel paradigm for training large language models solely on parallel data for machine translation tasks. The resulting PLUME models demonstrate strong performance on both supervised and zero-shot translation, rivaling previous specialized architectures.

By conducting a detailed investigation into the translation capabilities of these models, the researchers have provided valuable insights into how large language models handle multilingualism and cross-lingual representation. This work opens up new avenues for further research and development in the field of machine translation, potentially leading to more efficient and versatile translation systems in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

How Multilingual Are Large Language Models Fine-Tuned for Translation?

Aquia Richburg, Marine Carpuat

A new paradigm for machine translation has recently emerged: fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data (Xu et al., 2024a; Alves et al., 2024). However, it remains unclear whether this paradigm can enable massively multilingual machine translation or whether it requires fine-tuning dedicated models for a small number of language pairs. How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English? To address these questions, we conduct an extensive empirical evaluation of the translation quality of the TOWER family of language models (Alves et al., 2024) on 132 translation tasks from the multi-parallel FLORES-200 data. We find that translation fine-tuning improves translation quality even for zero-shot languages on average, but that the impact is uneven depending on the language pairs involved. These results call for further research to effectively enable massively multilingual translation with LLMs.

6/3/2024

cs.CL cs.LG

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

Jiaxin Guo, Hao Yang, Zongyao Li, Daimeng Wei, Hengchao Shang, Xiaoyu Chen

This paper presents a study on strategies to enhance the translation capabilities of large language models (LLMs) in the context of machine translation (MT) tasks. The paper proposes a novel paradigm consisting of three stages: Secondary Pre-training using Extensive Monolingual Data, Continual Pre-training with Interlinear Text Format Documents, and Leveraging Source-Language Consistent Instruction for Supervised Fine-Tuning. Previous research on LLMs focused on various strategies for supervised fine-tuning (SFT), but their effectiveness has been limited. While traditional machine translation approaches rely on vast amounts of parallel bilingual data, our paradigm highlights the importance of using smaller sets of high-quality bilingual data. We argue that the focus should be on augmenting LLMs' cross-lingual alignment abilities during pre-training rather than solely relying on extensive bilingual data during SFT. Experimental results conducted using the Llama2 model, particularly on Chinese-Llama2 after monolingual augmentation, demonstrate the improved translation capabilities of LLMs. A significant contribution of our approach lies in Stage2: Continual Pre-training with Interlinear Text Format Documents, which requires less than 1B training data, making our method highly efficient. Additionally, in Stage3, we observed that setting instructions consistent with the source language benefits the supervised fine-tuning process. Experimental results demonstrate that our approach surpasses previous work and achieves superior performance compared to models such as NLLB-54B and GPT3.5-text-davinci-003, despite having a significantly smaller parameter count of only 7B or 13B. This achievement establishes our method as a pioneering strategy in the field of machine translation.

4/16/2024

cs.CL

💬

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li

Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs' performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM.

6/17/2024

cs.CL

💬

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Jiahuan Li, Hao Zhou, Shujian Huang, Shanbo Cheng, Jiajun Chen

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translations, without being explicitly trained on parallel corpora. It is interesting how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. Firstly, we show that multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language, the performance depends on its similarity to English and the amount of data used in the pretraining phase. Secondly, we find that LLMs' ability to carry out translation instructions relies on the understanding of translation instructions and the alignment among different languages. With multilingual finetuning, LLMs could learn to perform the translation task well even for those language pairs unseen during the instruction tuning phase.

4/16/2024

cs.CL