Performance of Recent Large Language Models for a Low-Resourced Language

Read original: arXiv:2407.21330 - Published 8/1/2024 by Ravindu Jayakody, Gihan Dias

🚀

Overview

Large language models (LLMs) have made significant advancements in the past year.
New versions of GPT and Llama, as well as several other LLMs, have been introduced.
Some of these LLMs are open models available for download and modification.
Multilingual LLMs have been available for some time, but their performance on low-resourced languages like Sinhala has been poor.

Plain English Explanation

Large language models are powerful AI systems that can understand and generate human-like text. Over the past year, there have been major improvements in these models, with new versions of popular models like GPT and Llama, as well as the introduction of other LLMs. Some of these newer models are freely available for anyone to download and modify.

While multilingual LLMs have existed for a while, they have struggled to perform well on languages that don't have a lot of available data, like the Sinhala language. This recent research evaluated the performance of four recent LLMs on Sinhala, both directly and by translating to and from English. They also looked at how well these models can be fine-tuned (further trained) using a small amount of Sinhala data.

The results show that two of the models, Claude and GPT 4o, perform quite well even without fine-tuning, significantly better than previous versions. The other two models, Llama and Mistral, don't do as well initially but show promise for improvement with some fine-tuning on Sinhala data.

Technical Explanation

The researchers evaluated the performance of four recent large language models (LLMs) - Claude, GPT 4o, Llama, and Mistral - on the Sinhala language. They tested the models' performance both directly in Sinhala and by translating to and from English. They also evaluated the models' ability to be fine-tuned using a small amount of Sinhala data.

The results show that Claude and GPT 4o perform well out-of-the-box, significantly better than previous versions of these models. Llama and Mistral perform poorly initially but show promise for improvement with fine-tuning on a small amount of Sinhala data.

Critical Analysis

The paper provides a useful evaluation of the performance of several recent LLMs on the low-resourced Sinhala language. However, it's important to note that the results may not generalize to all low-resource languages, as the challenges and performance characteristics could vary.

Additionally, the researchers only tested a small set of models and fine-tuning approaches. There may be other LLMs or fine-tuning techniques that could perform even better on Sinhala. Further research exploring a broader range of models and fine-tuning methods would be valuable.

It would also be interesting to see how these LLMs perform on other tasks beyond just language understanding and generation, such as machine translation or question answering. This could provide a more comprehensive evaluation of their capabilities for low-resource languages.

Conclusion

This research highlights the progress that has been made in developing LLMs that can perform well on low-resource languages like Sinhala. The strong out-of-the-box performance of Claude and GPT 4o is particularly promising, as it suggests these models could be useful for a wide range of applications in under-served language communities.

The potential for Llama and Mistral to be fine-tuned on small amounts of data is also an important finding, as it could make it more feasible to adapt LLMs to low-resource settings. Overall, this research represents a valuable step forward in making large language models more accessible and useful for a diverse range of languages and communities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

Performance of Recent Large Language Models for a Low-Resourced Language

Ravindu Jayakody, Gihan Dias

Large Language Models (LLMs) have shown significant advances in the past year. In addition to new versions of GPT and Llama, several other LLMs have been introduced recently. Some of these are open models available for download and modification. Although multilingual large language models have been available for some time, their performance on low-resourced languages such as Sinhala has been poor. We evaluated four recent LLMs on their performance directly in the Sinhala language, and by translation to and from English. We also evaluated their fine-tunability with a small amount of fine-tuning data. Claude and GPT 4o perform well out-of-the-box and do significantly better than previous versions. Llama and Mistral perform poorly but show some promise of improvement with fine tuning.

8/1/2024

💬

How good are Large Language Models on African Languages?

Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani

Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap.

5/1/2024

💬

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li

Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs' performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM.

6/17/2024

A Survey of Large Language Models for European Languages

Wazir Ali, Sampo Pyysalo

Large Language Models (LLMs) have gained significant attention due to their high performance on a wide range of natural language tasks since the release of ChatGPT. The LLMs learn to understand and generate language by training billions of model parameters on vast volumes of text data. Despite being a relatively new field, LLM research is rapidly advancing in various directions. In this paper, we present an overview of LLM families, including LLaMA, PaLM, GPT, and MoE, and the methods developed to create and enhance LLMs for official European Union (EU) languages. We provide a comprehensive summary of common monolingual and multilingual datasets used for pretraining large language models.

8/29/2024