Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

2403.05434

Published 4/22/2024 by Arijit Nag, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

🛠️

Abstract

Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs). A few of them have been trained on low-resource languages (LRLs) and give decent performance. Owing to the prohibitive costs of training LLMs, they are usually used as a network service, with the client charged by the count of input and output tokens. The number of tokens strongly depends on the script and language, as well as the LLM's subword vocabulary. We show that LRLs are at a pricing disadvantage, because the well-known LLMs produce more tokens for LRLs than HRLs. This is because most currently popular LLMs are optimized for HRL vocabularies. Our objective is to level the playing field: reduce the cost of processing LRLs in contemporary LLMs while ensuring that predictive and generative qualities are not compromised. As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs. We perform an extensive study using the IndicXTREME classification and six generative tasks dataset, covering 15 Indic and 3 other languages, while using GPT-4 (one of the costliest LLM services released so far) as a commercial LLM. We observe and analyze interesting patterns involving token count, cost, and quality across a multitude of languages and tasks. We show that choosing the best policy to interact with the LLM can reduce cost by 90% while giving better or comparable performance compared to communicating with the LLM in the original LRL.

Get summaries of the top AI research delivered straight to your inbox:

Overview

Large language models (LLMs) excel at tasks in high-resource languages (HRLs) but struggle with low-resource languages (LRLs)
Training LLMs is costly, and clients are charged based on the number of input and output tokens
LRLs often result in more tokens processed by the LLM, putting them at a pricing disadvantage
The paper aims to reduce the cost of processing LRLs in LLMs without compromising performance

Plain English Explanation

Large language models are AI systems trained on vast amounts of text data to understand and generate human-like language. These models excel at tasks in common, widely-spoken languages, but struggle more with less common, "low-resource" languages.

Training these large language models is extremely expensive, so companies often charge customers based on the number of words (or "tokens") that the model processes. The paper shows that when using these models for low-resource languages, more tokens tend to be processed, leading to higher costs for customers.

The researchers wanted to find ways to reduce the number of tokens processed for low-resource languages, while still maintaining the quality of the model's performance. They explored techniques like code-mixing, translation, and transliteration to convert low-resource language inputs into a format that the language model could process more efficiently.

Through extensive testing on a variety of languages and tasks, the researchers found that using the right technique could reduce the costs of using the language model by up to 90%, while still maintaining or even improving the model's performance. This could make these powerful AI systems more accessible and affordable, especially for users working with less common languages.

Technical Explanation

The paper examines the challenges of using large language models (LLMs) for low-resource languages (LRLs). While LLMs have shown impressive performance on high-resource languages (HRLs), their use for LRLs is more limited, in part due to the prohibitive costs of training these models.

LLMs are often provided as a commercial service, with clients charged based on the number of input and output tokens processed by the model. The paper demonstrates that LRLs tend to result in more tokens being processed compared to HRLs, putting LRL users at a pricing disadvantage.

To address this issue, the researchers explore several techniques to reduce the number of tokens processed for LRLs without compromising the model's predictive and generative capabilities. These include code-mixing, translation, and transliteration of LRL inputs to HRL formats.

The team conducted an extensive study using the IndicXTREME classification and six generative tasks dataset, covering 15 Indic and 3 other languages, with GPT-4 (one of the costliest LLM services) as the commercial model. They analyzed the patterns across token count, cost, and quality for the various languages and tasks.

The results show that by choosing the optimal policy to interact with the LLM, the cost can be reduced by up to 90% while maintaining or even improving the performance compared to using the LRL directly.

Critical Analysis

The paper provides a valuable analysis of the challenges and potential solutions for using large language models with low-resource languages. The researchers' emphasis on the cost implications of token processing is particularly relevant, as it highlights an important practical concern for users and developers working with these AI systems.

However, the paper does not delve deeply into the potential limitations or risks of the proposed techniques, such as the impact of code-mixing, translation, or transliteration on the semantic accuracy or cultural appropriateness of the model's outputs. Additionally, the study is focused on a specific set of languages, and the findings may not generalize to all low-resource languages or language families.

Further research could explore the long-term implications of these approaches, such as their impact on language preservation, the potential for unintended biases, and the feasibility of enhancing general agent capabilities in low-resource language contexts.

Conclusion

This paper addresses an important challenge in the widespread adoption of large language models: the higher costs associated with processing low-resource languages. By exploring techniques like code-mixing, translation, and transliteration, the researchers demonstrate that the token count, and thus the processing costs, can be significantly reduced without compromising the model's performance.

The findings of this study have the potential to make powerful AI language tools more accessible and affordable, especially for users working with less common languages. As the field of natural language processing continues to evolve, addressing the needs of diverse linguistic communities will be crucial for ensuring the equitable and inclusive development of these transformative technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔄

Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages

David Ifeoluwa Adelani, A. Seza Dou{g}ruoz, Andr'e Coneglian, Atul Kr. Ojha

Large Language Models are transforming NLP for a variety of tasks. However, how LLMs perform NLP tasks for low-resource languages (LRLs) is less explored. In line with the goals of the AmericasNLP workshop, we focus on 12 LRLs from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs. We explain the reasons behind this failure and provide an error analysis through examples observed in our data set.

5/1/2024

cs.CL

💬

LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language

Cagri Toraman

Despite advancements in English-dominant generative large language models, further development is needed for low-resource languages to enhance global accessibility. The primary methods for representing these languages are monolingual and multilingual pretraining. Monolingual pretraining is expensive due to hardware requirements, and multilingual models often have uneven performance across languages. This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages. We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension. The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks. However, extending the vocabulary shows no substantial benefits. Additionally, while larger models improve task performance with few-shot tuning, multilingual models perform worse than their monolingual counterparts when adapted.

5/14/2024

cs.CL cs.AI

💬

How good are Large Language Models on African Languages?

Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani

Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap.

5/1/2024

cs.CL cs.AI cs.LG

💬

Planning with Language Models Through The Lens of Efficiency

Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi

We analyse the cost of using LLMs for planning and highlight that recent trends are profoundly uneconomical. We propose a significantly more efficient approach and argue for a responsible use of compute resources; urging research community to investigate LLM-based approaches that upholds efficiency.

4/19/2024

cs.AI