Machine Translation for Ge'ez Language

2311.14530

Published 4/16/2024 by Aman Kassahun Wassie

💬

Abstract

Machine translation (MT) for low-resource languages such as Ge'ez, an ancient language that is no longer the native language of any community, faces challenges such as out-of-vocabulary words, domain mismatches, and lack of sufficient labeled training data. In this work, we explore various methods to improve Ge'ez MT, including transfer-learning from related languages, optimizing shared vocabulary and token segmentation approaches, finetuning large pre-trained models, and using large language models (LLMs) for few-shot translation with fuzzy matches. We develop a multilingual neural machine translation (MNMT) model based on languages relatedness, which brings an average performance improvement of about 4 BLEU compared to standard bilingual models. We also attempt to finetune the NLLB-200 model, one of the most advanced translation models available today, but find that it performs poorly with only 4k training samples for Ge'ez. Furthermore, we experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches, which leverages embedding similarity-based retrieval to find context examples from a parallel corpus. We observe that GPT-3.5 achieves a remarkable BLEU score of 9.2 with no initial knowledge of Ge'ez, but still lower than the MNMT baseline of 15.2. Our work provides insights into the potential and limitations of different approaches for low-resource and ancient language MT.

Create account to get full access

Overview

The paper explores ways to improve machine translation (MT) for low-resource languages like Ge'ez, an ancient language with limited training data.
Techniques investigated include transfer learning from related languages, optimizing vocabulary and token segmentation, fine-tuning large pre-trained models, and using large language models (LLMs) for few-shot translation.
The researchers developed a multilingual neural machine translation (MNMT) model that outperformed standard bilingual models, but found that fine-tuning an advanced model like NLLB-200 was challenging with only 4k Ge'ez training samples.
Experiments using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches showed promise, but the performance was still lower than the MNMT baseline.

Plain English Explanation

Machine translation (MT) is the process of automatically translating text from one language to another. However, for low-resource languages like Ge'ez, an ancient language with few modern speakers, MT faces significant challenges. These include a lack of training data, words that are not found in the translation model's vocabulary, and mismatches between the language used in the training data and the language used in real-world applications.

To address these challenges, the researchers in this paper explored several different approaches. They tried transfer learning from related languages, which involves using a model trained on a more well-resourced language as a starting point for training on the low-resource language. They also experimented with optimizing the vocabulary and how the words are broken down (tokenized) to better match the characteristics of the low-resource language.

Additionally, the researchers fine-tuned some of the most advanced translation models available today, like the NLLB-200 model, on the limited Ge'ez training data. However, they found that this approach did not work well with only 4,000 Ge'ez training samples.

Finally, the researchers explored using a large language model (LLM) like GPT-3.5 for "few-shot" translation. This means using the LLM's ability to understand and generate language to quickly adapt to translating the low-resource language, even with very little training data. The researchers used the LLM's language understanding to retrieve similar example translations from a parallel corpus, and then guide the LLM to produce translations based on those examples.

While the LLM-based approach showed promising results, with a BLEU score of 9.2 (a metric for measuring translation quality), it still fell short of the performance of the multilingual neural machine translation (MNMT) model developed by the researchers, which achieved a BLEU score of 15.2.

Technical Explanation

The paper explores various methods to improve machine translation (MT) for low-resource languages, using Ge'ez as a case study. Ge'ez is an ancient language that is no longer the native language of any community, which presents challenges such as out-of-vocabulary words, domain mismatches, and lack of sufficient labeled training data.

The researchers first developed a multilingual neural machine translation (MNMT) model based on the relatedness of languages. This MNMT model brought an average performance improvement of about 4 BLEU points compared to standard bilingual models.

Next, the researchers attempted to fine-tune the NLLB-200 model, one of the most advanced translation models available today, on the Ge'ez dataset. However, they found that this approach performed poorly, with only a 4 BLEU score, due to the limited 4,000 training samples for Ge'ez.

Furthermore, the researchers experimented with using GPT-3.5, a state-of-the-art large language model (LLM), for few-shot translation with fuzzy matches. This approach leverages the LLM's language understanding capabilities to retrieve similar example translations from a parallel corpus and then guide the LLM to produce translations based on those examples. Despite the promising BLEU score of 9.2 achieved by this method, it still fell short of the MNMT baseline of 15.2.

Critical Analysis

The paper provides valuable insights into the challenges and potential solutions for low-resource language machine translation, such as Ge'ez. The researchers' exploration of various techniques, including transfer learning, vocabulary and tokenization optimization, fine-tuning large pre-trained models, and leveraging large language models, offers a comprehensive view of the current state of the art in this domain.

One notable limitation of the study is the small size of the Ge'ez dataset, which likely contributed to the poor performance of the fine-tuned NLLB-200 model. It would be interesting to see how this approach might perform with a larger dataset or if the researchers could find ways to effectively augment the existing data.

Additionally, while the GPT-3.5-based few-shot translation with fuzzy matches showed promise, the researchers did not provide a detailed analysis of the strengths and weaknesses of this method. It would be valuable to understand the specific types of challenges it can address and the scenarios where it might outperform other approaches.

Overall, the paper presents a thorough investigation of various techniques for low-resource language machine translation and serves as a useful reference for researchers and practitioners working in this field. The insights gained from this study can inform future work and contribute to the ongoing efforts to improve MT capabilities for languages with limited resources.

Conclusion

This paper explores several innovative approaches to improve machine translation (MT) for low-resource languages, such as the ancient Ge'ez language. The researchers developed a multilingual neural machine translation (MNMT) model that outperformed standard bilingual models, demonstrating the potential of leveraging language relatedness. However, they also encountered challenges in fine-tuning advanced models like NLLB-200 on the limited Ge'ez training data.

The researchers' experiments with using a large language model (LLM), GPT-3.5, for few-shot translation with fuzzy matches showed promising results, though the performance was still lower than the MNMT baseline. This work provides valuable insights into the current state of the art in low-resource language MT and highlights the need for continued research and innovation in this field.

As machine translation technologies continue to evolve, the techniques explored in this paper can serve as a foundation for developing more robust and adaptable systems capable of handling a diverse range of languages, including those with limited resources.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤖

Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

The task of accurate and efficient language translation is an extremely important information processing task. Machine learning enabled and automated translation that is accurate and fast is often a large topic of interest in the machine learning and data science communities. In this study, we examine using local Generative Pretrained Transformer (GPT) models to perform automated zero shot black-box, sentence wise, multi-natural-language translation into English text. We benchmark 16 different open-source GPT models, with no custom fine-tuning, from the Huggingface LLM repository for translating 50 different non-English languages into English using translated TED Talk transcripts as the reference dataset. These GPT model inference calls are performed strictly locally, on single A100 Nvidia GPUs. Benchmark metrics that are reported are language translation accuracy, using BLEU, GLEU, METEOR, and chrF text overlap measures, and wall-clock time for each sentence translation. The best overall performing GPT model for translating into English text for the BLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.152$, for the GLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.256$, for the chrF metric is Llama2-chat-AYT-13B with a mean score across all tested languages of $0.448$, and for the METEOR metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.438$.

4/24/2024

cs.CL cs.AI cs.LG

💬

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li

Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs' performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM.

6/17/2024

cs.CL

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely GenTranslate, which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model.

5/17/2024

cs.CL cs.AI cs.LG cs.SD eess.AS

💬

How good are Large Language Models on African Languages?

Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani

Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap.

5/1/2024

cs.CL cs.AI cs.LG