GEB-1.3B: Open Lightweight Large Language Model

2406.09900

Published 6/17/2024 by Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu

GEB-1.3B: Open Lightweight Large Language Model

Abstract

Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the extensive calculation requirements of the models often lead to increased latency in response times. With the increasing need for LLMs to operate efficiently on CPUs, research about lightweight models that are optimized for CPU inference has emerged. In this work, we introduce GEB-1.3B, a lightweight LLM trained on 550 billion tokens in both Chinese and English languages. We employ novel training techniques, including ROPE, Group-Query-Attention, and FlashAttention-2, to accelerate training while maintaining model performance. Additionally, we fine-tune the model using 10 million samples of instruction data to enhance alignment. GEB-1.3B exhibits outstanding performance on general benchmarks such as MMLU, C-Eval, and CMMLU, outperforming comparative models such as MindLLM-1.3B and TinyLLaMA-1.1B. Notably, the FP32 version of GEB-1.3B achieves commendable inference times on CPUs, with ongoing efforts to further enhance speed through advanced quantization techniques. The release of GEB-1.3B as an open-source model marks a significant contribution to the development of lightweight LLMs, promising to foster further research and innovation in the field.

Create account to get full access

Overview

This paper introduces GEB-1.3B, an open and lightweight large language model (LLM) that aims to be highly capable and transparent.
The model was pre-trained on a diverse corpus of text data and fine-tuned on a range of downstream tasks.
The researchers highlight GEB-1.3B's strong performance across multiple benchmarks while maintaining a relatively small model size compared to other prominent LLMs.

Plain English Explanation

The researchers have developed a new large language model called GEB-1.3B. Large language models are powerful AI systems that can understand and generate human-like text. GEB-1.3B is designed to be open, meaning its code and training data are publicly available, and lightweight, meaning it has a relatively small file size compared to other large language models.

Despite its smaller size, the researchers claim that GEB-1.3B performs well on a variety of tasks, from answering questions to generating coherent text. They trained the model on a diverse set of text data from the internet, which allows it to understand and communicate on a wide range of topics. The researchers then fine-tuned the model, which means they further trained it on specific tasks to improve its performance.

The key goals of the GEB-1.3B project are to create a highly capable language model that is also transparent and accessible to the public. By making the model open-source, the researchers hope that other researchers and developers can build upon their work and further advance the field of natural language processing.

Technical Explanation

The researchers pre-trained the GEB-1.3B model on a large corpus of web-crawled text data, which included a diverse range of content such as books, articles, and websites. They used a transformer-based architecture, a common approach in modern language models, which allows the model to capture complex relationships between words and phrases.

To improve the model's performance, the researchers fine-tuned GEB-1.3B on a variety of downstream tasks, such as question answering, text summarization, and natural language inference. They evaluated the model's capabilities on several benchmark datasets and found that it achieved strong results, often outperforming larger and more complex language models.

One of the key innovations of the GEB-1.3B model is its relatively small size compared to other prominent large language models. The researchers were able to achieve this by carefully selecting the model architecture and hyperparameters during the pre-training and fine-tuning stages. This makes GEB-1.3B more accessible and easier to deploy in resource-constrained environments.

Critical Analysis

The researchers have provided a detailed and transparent account of the GEB-1.3B model development process, which is commendable. However, the paper does not fully address the potential limitations or ethical considerations of large language models.

For example, the paper does not discuss the model's robustness to adversarial attacks or its ability to generate biased or harmful content. Additionally, the researchers do not explore the potential privacy and security implications of using an open-source language model, such as the risk of misuse or exploitation.

Further research is needed to investigate these aspects and ensure that GEB-1.3B and similar models are developed and deployed responsibly. The researchers could also consider incorporating more rigorous testing and evaluation procedures to better understand the model's limitations and potential pitfalls.

Conclusion

The GEB-1.3B model represents an important step forward in the development of open and accessible large language models. By achieving strong performance on a range of tasks while maintaining a relatively small model size, the researchers have demonstrated the potential for highly capable yet lightweight language models.

However, the paper also highlights the need for continued research and development in this area, particularly around issues of safety, ethics, and transparency. As large language models become more prevalent, it is crucial that the research community addresses these concerns to ensure that the technology is developed and deployed in a responsible and beneficial manner.

Overall, the GEB-1.3B project is a promising contribution to the field of natural language processing, and the researchers' commitment to open-source development is a valuable approach that may inspire further advancements in this rapidly evolving area of AI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu, Noah Wang, Quehry Que, Ruibo Liu, Sine Liu, Shawn Guo, Soren Gao, Wangchunshu Zhou, Xinyue Zhang, Yizhi Zhou, Yubo Wang, Yuelin Bai, Yuhan Zhang, Yuxiang Zhang, Zenith Wang, Zhenzhu Yang, Zijian Zhao, Jiajun Zhang, Wanli Ouyang, Wenhao Huang, Wenhu Chen

Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs.

6/4/2024

cs.CL cs.AI cs.LG

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE-8B demonstrates impressive performance, with JetMoE-8B outperforming the Llama2-7B model and JetMoE-8B-Chat surpassing the Llama2-13B-Chat model. These results suggest that LLM training can be much more cost-effective than generally thought. JetMoE-8B is based on an efficient Sparsely-gated Mixture-of-Experts (SMoE) architecture, composed of attention and feedforward experts. Both layers are sparsely activated, allowing JetMoE-8B to have 8B parameters while only activating 2B for each input token, reducing inference computation by about 70% compared to Llama2-7B. Moreover, JetMoE-8B is highly open and academia-friendly, using only public datasets and training code. All training parameters and data mixtures have been detailed in this report to facilitate future efforts in the development of open foundation models. This transparency aims to encourage collaboration and further advancements in the field of accessible and efficient LLMs. The model weights are publicly available at https://github.com/myshell-ai/JetMoE.

4/12/2024

cs.CL cs.AI

Xmodel-LM Technical Report

Yichuan Wang, Yang Liu, Yu Yan, Qun Wang, Xucheng Huang, Ling Jiang

We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.

6/27/2024

cs.CL cs.AI

💬

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li

Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs' performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM.

6/17/2024

cs.CL