Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages

2404.18286

Published 5/1/2024 by David Ifeoluwa Adelani, A. Seza Dou{g}ruoz, Andr'e Coneglian, Atul Kr. Ojha

🔄

Abstract

Large Language Models are transforming NLP for a variety of tasks. However, how LLMs perform NLP tasks for low-resource languages (LRLs) is less explored. In line with the goals of the AmericasNLP workshop, we focus on 12 LRLs from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs. We explain the reasons behind this failure and provide an error analysis through examples observed in our data set.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper investigates the use of Large Language Models (LLMs) for Part-of-Speech (POS) tagging in Low-Resource Languages (LRLs).
The researchers focus on the challenges of applying LLMs to LRLs, which often lack the large datasets required to fine-tune these models effectively.
The paper presents an empirical study examining the performance of LLMs on POS tagging tasks in several LRLs, highlighting the limitations of current approaches.

Plain English Explanation

The paper looks at using advanced language models, called Large Language Models (LLMs), for a particular task called Part-of-Speech (POS) tagging in Low-Resource Languages (LRLs). POS tagging is the process of identifying the grammatical role of each word in a sentence, such as noun, verb, adjective, etc.

The challenge is that LLMs typically require very large datasets to be trained effectively, but many languages around the world, especially smaller or lesser-used languages, don't have these large datasets available. The researchers in this paper investigate how well LLMs perform on POS tagging tasks for several LRLs, to understand the current limitations of using these powerful models in low-resource settings.

Technical Explanation

The paper presents an empirical study examining the performance of various LLM architectures, including BERT, XLM-R, and mT5, on POS tagging tasks for several LRLs. The researchers fine-tune these pre-trained LLMs on available POS-annotated datasets for each language and evaluate their performance.

The results show that while LLMs can achieve reasonable performance on POS tagging in high-resource languages, their accuracy drops significantly when applied to LRLs. The paper discusses various factors that contribute to this performance gap, including the limited availability of training data, the linguistic diversity of LRLs, and the inherent biases in the pre-training of LLMs towards high-resource languages.

Critical Analysis

The paper provides a candid assessment of the current limitations of using LLMs for POS tagging in LRLs. While LLMs have shown tremendous success in many NLP tasks, the authors rightly point out that these models still struggle with low-resource settings, where the lack of large annotated datasets poses a significant challenge.

The paper could have delved deeper into potential solutions or strategies to address this problem, such as data augmentation techniques, cross-lingual transfer learning, or retrieval-augmented approaches. Additionally, the paper could have discussed the broader implications of the findings, such as the need for more investment and research in supporting LRLs in the era of large language models.

Conclusion

This paper highlights the limitations of current LLM-based approaches for POS tagging in Low-Resource Languages. While LLMs have achieved remarkable success in many NLP tasks, the authors demonstrate that their performance drops significantly when applied to languages with limited training data. This finding underscores the need for further research and development to make these powerful models more accessible and effective for a wider range of languages, including the many lesser-used languages around the world.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛠️

Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

Arijit Nag, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs). A few of them have been trained on low-resource languages (LRLs) and give decent performance. Owing to the prohibitive costs of training LLMs, they are usually used as a network service, with the client charged by the count of input and output tokens. The number of tokens strongly depends on the script and language, as well as the LLM's subword vocabulary. We show that LRLs are at a pricing disadvantage, because the well-known LLMs produce more tokens for LRLs than HRLs. This is because most currently popular LLMs are optimized for HRL vocabularies. Our objective is to level the playing field: reduce the cost of processing LRLs in contemporary LLMs while ensuring that predictive and generative qualities are not compromised. As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs. We perform an extensive study using the IndicXTREME classification and six generative tasks dataset, covering 15 Indic and 3 other languages, while using GPT-4 (one of the costliest LLM services released so far) as a commercial LLM. We observe and analyze interesting patterns involving token count, cost, and quality across a multitude of languages and tasks. We show that choosing the best policy to interact with the LLM can reduce cost by 90% while giving better or comparable performance compared to communicating with the LLM in the original LRL.

4/22/2024

cs.CL

Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets

Shadi Manafi, Nikhil Krishnaswamy

Multilingual Language Models (MLLMs) exhibit robust cross-lingual transfer capabilities, or the ability to leverage information acquired in a source language and apply it to a target language. These capabilities find practical applications in well-established Natural Language Processing (NLP) tasks such as Named Entity Recognition (NER). This study aims to investigate the effectiveness of a source language when applied to a target language, particularly in the context of perturbing the input test set. We evaluate on 13 pairs of languages, each including one high-resource language (HRL) and one low-resource language (LRL) with a geographic, genetic, or borrowing relationship. We evaluate two well-known MLLMs--MBERT and XLM-R--on these pairs, in native LRL and cross-lingual transfer settings, in two tasks, under a set of different perturbations. Our findings indicate that NER cross-lingual transfer depends largely on the overlap of entity chunks. If a source and target language have more entities in common, the transfer ability is stronger. Models using cross-lingual transfer also appear to be somewhat more robust to certain perturbations of the input, perhaps indicating an ability to leverage stronger representations derived from the HRL. Our research provides valuable insights into cross-lingual transfer and its implications for NLP applications, and underscores the need to consider linguistic nuances and potential limitations when employing MLLMs across distinct languages.

4/1/2024

cs.CL

💬

How good are Large Language Models on African Languages?

Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani

Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap.

5/1/2024

cs.CL cs.AI cs.LG

Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language

Raphael Merx, Aso Mahmudi, Katrina Langford, Leo Alberto de Araujo, Ekaterina Vylomova

This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT) in this low-resource context. Our methodology involves the strategic selection of parallel sentences and dictionary entries for prompting, aiming to enhance translation accuracy, using open-source and proprietary LLMs (LlaMa 2 70b, Mixtral 8x7B, GPT-4). We find that including dictionary entries in prompts and a mix of sentences retrieved through TF-IDF and semantic embeddings significantly improves translation quality. However, our findings reveal stark disparities in translation performance across test sets, with BLEU scores reaching as high as 21.2 on materials from the language manual, in contrast to a maximum of 4.4 on a test set provided by a native speaker. These results underscore the importance of diverse and representative corpora in assessing MT for low-resource languages. Our research provides insights into few-shot LLM prompting for low-resource MT, and makes available an initial corpus for the Mambai language.

4/9/2024

cs.CL