EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation

Read original: arXiv:2403.13737 - Published 6/26/2024 by Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek and 3 others

💬

Overview

This paper introduces EthioLLM, a set of multilingual large language models (LLMs) for various Ethiopian languages.
The models are evaluated on a range of downstream tasks to assess their performance and capabilities.
The authors aim to advance the state of the art in language technology for under-resourced languages like those spoken in Ethiopia.

Plain English Explanation

The researchers have developed a series of large language models, called EthioLLM, that can understand and generate text in multiple Ethiopian languages. Large language models are powerful AI systems that are trained on vast amounts of text data to learn the patterns and structure of language.

By creating these models specifically for Ethiopian languages, the researchers hope to make language technology more accessible and useful for people in Ethiopia and the surrounding region. Many languages spoken in this part of the world have historically been underserved by mainstream language technology, so this work aims to help bridge that gap.

The paper evaluates how well the EthioLLM models perform on various real-world tasks, like summarizing text, answering questions, and translating between languages. This allows the researchers to assess the strengths and limitations of their models, and identify areas for future improvement.

Overall, this work represents an important step forward in developing language AI systems that are tailored to the needs and contexts of under-resourced regions like Ethiopia. By making progress in this area, the researchers hope to unlock the potential of language technology to benefit more of the world's diverse linguistic communities.

Technical Explanation

The EthioLLM paper describes the development and evaluation of a set of large language models for multiple Ethiopian languages, including Amharic, Tigrinya, Oromo, and Somali. The models were trained on a large corpus of text data in these languages, aggregated from various online sources.

The researchers employed a multi-task learning approach, training the models to perform well on a diverse set of downstream tasks relevant to real-world applications, such as text summarization, question answering, and translation. This allowed the models to develop a more well-rounded understanding of language use.

The models were extensively evaluated on a range of benchmarks, including datasets specifically created for the Ethiopian languages. The results demonstrate the strong performance of EthioLLM, outperforming previous state-of-the-art models on many tasks.

Critical Analysis

The EthioLLM paper makes a valuable contribution to the field of language technology for under-resourced languages. By developing high-performing LLMs tailored to Ethiopian languages, the researchers have taken an important step towards making language AI more inclusive and accessible.

However, the paper also acknowledges several limitations and areas for further research. For instance, the training data used to create the models may not fully capture the linguistic diversity and regional variations present across Ethiopia. Additionally, the models' performance on more specialized or domain-specific tasks was not extensively evaluated.

Future work could explore ways to further enhance the models' robustness and generalization capabilities, such as through the integration of additional contextual or cultural knowledge. Expanding the range of evaluation tasks and datasets would also help to more comprehensively assess the models' real-world applicability.

Conclusion

The EthioLLM paper represents a significant advancement in the development of language technology for under-resourced languages. By creating high-performing LLMs tailored to multiple Ethiopian languages, the researchers have laid the foundation for more inclusive and accessible language AI systems.

This work has the potential to unlock new possibilities for language-based applications and services in Ethiopia and the surrounding region, empowering local communities and contributing to the broader goal of making technology more responsive to the needs of diverse linguistic and cultural contexts worldwide.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation

Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam

Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassing a wide array of scripts, and are imbued with profound religious and cultural significance. This paper introduces EthioLLM -- multilingual large language models for five Ethiopian languages (Amharic, Ge'ez, Afan Oromo, Somali, and Tigrinya) and English, and Ethiobenchmark -- a new benchmark dataset for various downstream NLP tasks. We evaluate the performance of these models across five downstream NLP tasks. We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models. Our dataset and models are available at the https://huggingface.co/EthioNLP repository.

6/26/2024

Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets

Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Mitiku Yohannes Fuge, Aman Kassahun Wassie, Eyasu Shiferaw Jada, Yonas Chanie, Walelign Tewabe Sewunetie, Seid Muhie Yimam

Large language models (LLMs) have received a lot of attention in natural language processing (NLP) research because of their exceptional performance in understanding and generating human languages. However, low-resource languages are left behind due to the unavailability of resources. In this work, we focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets to improve language model performance for Amharic. We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model. The fine-tuned model shows promising results in different NLP tasks. We open-source our dataset creation pipeline, instruction datasets, trained models, and evaluation outputs to promote language-specific studies on these models.

4/30/2024

💬

How good are Large Language Models on African Languages?

Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani

Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap.

5/1/2024

💬

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram

There has been a surge in LLM evaluation research to understand LLM capabilities and limitations. However, much of this research has been confined to English, leaving LLM building and evaluation for non-English languages relatively unexplored. Several new LLMs have been introduced recently, necessitating their evaluation on non-English languages. This study aims to perform a thorough evaluation of the non-English capabilities of SoTA LLMs (GPT-3.5-Turbo, GPT-4, PaLM2, Gemini-Pro, Mistral, Llama2, and Gemma) by comparing them on the same set of multilingual datasets. Our benchmark comprises 22 datasets covering 83 languages, including low-resource African languages. We also include two multimodal datasets in the benchmark and compare the performance of LLaVA models, GPT-4-Vision and Gemini-Pro-Vision. Our experiments show that larger models such as GPT-4, Gemini-Pro and PaLM2 outperform smaller models on various tasks, notably on low-resource languages, with GPT-4 outperforming PaLM2 and Gemini-Pro on more datasets. We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks, necessitating approaches to detect and handle contamination while assessing the multilingual performance of LLMs.

4/4/2024