Qibo: A Large Language Model for Traditional Chinese Medicine

2403.16056

Published 6/26/2024 by Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo

$Qibo: A Large Language Model for Traditional Chinese Medicine$

Abstract

Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident predictions. To address these challenges, we propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning. A notable contribution of our study is the processing of a 2GB corpus dedicated to TCM, constructing pre-training and instruction fine-tuning datasets for TCM, respectively. In addition, we have developed Qibo-Benchmark, a tool that evaluates the performance of LLM in the TCM on multiple dimensions, including subjective, objective, and three TCM NLP tasks. The medical LLM trained with our pipeline, named $textbf{Qibo}$, exhibits significant performance boosts. Compared to the baselines, the average subjective win rate is 63%, the average objective accuracy improved by 23% to 58%, and the Rouge-L scores for the three TCM NLP tasks are 0.72, 0.61, and 0.55. Finally, we propose a pipline to apply Qibo to TCM consultation and demonstrate the model performance through the case study.

Create account to get full access

Overview

This paper presents Qibo, a large language model trained on a large dataset of traditional Chinese medicine (TCM) texts.
The model is designed to assist with tasks related to TCM, such as TCMBench, TCMD, and CMB.
The paper evaluates the performance of Qibo on various TCM-related tasks and compares it to other large language models.

Plain English Explanation

The researchers have developed a powerful artificial intelligence (AI) system called Qibo that is specifically trained to understand and work with information about traditional Chinese medicine (TCM). TCM is a centuries-old system of healthcare that uses natural remedies, acupuncture, and other holistic approaches. Qibo is a "large language model," which means it has been trained on a huge amount of text data to become very good at understanding and generating human-like language.

By training Qibo on a large dataset of TCM texts, the researchers have created an AI tool that can assist with a variety of TCM-related tasks. For example, it can help answer questions about TCM, summarize TCM concepts, or even generate new TCM-related content. The paper evaluates how well Qibo performs on these kinds of tasks compared to other AI language models.

The goal is for Qibo to be a valuable resource for TCM practitioners, researchers, and anyone interested in this traditional healthcare approach. By leveraging the power of AI, the researchers hope to make TCM knowledge more accessible and useful in the modern world.

Technical Explanation

The paper introduces Qibo, a large language model trained on a large corpus of traditional Chinese medicine (TCM) texts. The model is designed to assist with a variety of TCM-related tasks, including the TCMBench benchmark, the TCMD dataset, and the CMB benchmark.

The authors describe the model architecture and training process, including the use of a custom TCM vocabulary and pretraining on a large corpus of TCM literature. They evaluate the performance of Qibo on a range of TCM-related tasks, such as question answering, text generation, and classification, and compare it to other state-of-the-art language models like Chinese Tiny LLM and models covered in the survey of large language models.

The results demonstrate that Qibo outperforms these other models on TCM-specific tasks, showcasing the value of specialized language models for domain-specific applications. The paper discusses the potential implications of Qibo for TCM research, education, and clinical practice, as well as the limitations and future research directions.

Critical Analysis

The paper presents a well-designed and carefully executed study on the development and evaluation of Qibo, a specialized large language model for traditional Chinese medicine. The authors have thoughtfully addressed key challenges in this domain, such as the need for a customized TCM vocabulary and the importance of pretraining on a large corpus of TCM-related texts.

However, the paper does not fully explore the potential biases or limitations of the Qibo model. For example, it is unclear how the model's performance may be affected by the diversity and representativeness of the training data, or how it might handle rare or emerging TCM concepts and practices. Additionally, the paper does not address the ethical implications of deploying such a powerful AI system in the context of traditional medicine, where cultural sensitivity and patient trust are paramount.

Further research is needed to thoroughly assess the robustness, fairness, and interpretability of Qibo, as well as its long-term impact on TCM education, research, and clinical practice. Nonetheless, this work represents an important step forward in the development of specialized language models for domain-specific applications, and the insights gained from this study may be valuable for researchers working on similar challenges in other fields.

Conclusion

The Qibo large language model represents a significant advancement in the field of traditional Chinese medicine (TCM) research and applications. By leveraging the power of AI to understand and work with TCM-related knowledge, the researchers have created a tool that has the potential to greatly benefit TCM practitioners, researchers, and the broader public.

The strong performance of Qibo on TCM-specific tasks, as demonstrated in the paper, suggests that specialized language models can be highly effective in domain-specific contexts. This work paves the way for further development and deployment of similar AI systems in other specialized domains, with the ultimate goal of making complex knowledge more accessible and useful for those who need it most.

While there are still important challenges and limitations to address, the Qibo project represents an exciting step forward in the integration of traditional and modern approaches to healthcare and knowledge management. As the field of AI continues to evolve, researchers and practitioners alike will need to work collaboratively to ensure that these powerful technologies are developed and deployed in a responsible and ethical manner.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine

Wenjing Yue, Xiaoling Wang, Wei Zhu, Ming Guan, Huanran Zheng, Pengfei Wang, Changzhi Sun, Xin Ma

Large language models (LLMs) have performed remarkably well in various natural language processing tasks by benchmarking, including in the Western medical domain. However, the professional evaluation benchmarks for LLMs have yet to be covered in the traditional Chinese medicine(TCM) domain, which has a profound history and vast influence. To address this research gap, we introduce TCM-Bench, an comprehensive benchmark for evaluating LLM performance in TCM. It comprises the TCM-ED dataset, consisting of 5,473 questions sourced from the TCM Licensing Exam (TCMLE), including 1,300 questions with authoritative analysis. It covers the core components of TCMLE, including TCM basis and clinical practice. To evaluate LLMs beyond accuracy of question answering, we propose TCMScore, a metric tailored for evaluating the quality of answers generated by LLMs for TCM related questions. It comprehensively considers the consistency of TCM semantics and knowledge. After conducting comprehensive experimental analyses from diverse perspectives, we can obtain the following findings: (1) The unsatisfactory performance of LLMs on this benchmark underscores their significant room for improvement in TCM. (2) Introducing domain knowledge can enhance LLMs' performance. However, for in-domain models like ZhongJing-TCM, the quality of generated analysis text has decreased, and we hypothesize that their fine-tuning process affects the basic LLM capabilities. (3) Traditional metrics for text generation quality like Rouge and BertScore are susceptible to text length and surface semantic ambiguity, while domain-specific metrics such as TCMScore can further supplement and explain their evaluation results. These findings highlight the capabilities and limitations of LLMs in the TCM and aim to provide a more profound assistance to medical research.

6/4/2024

cs.CL cs.AI

TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

Ping Yu, Kaitao Song, Fengchen He, Ming Chen, Jianfeng Lu

The recently unprecedented advancements in Large Language Models (LLMs) have propelled the medical community by establishing advanced medical-domain models. However, due to the limited collection of medical datasets, there are only a few comprehensive benchmarks available to gauge progress in this area. In this paper, we introduce a new medical question-answering (QA) dataset that contains massive manual instruction for solving Traditional Chinese Medicine examination tasks, called TCMD. Specifically, our TCMD collects massive questions across diverse domains with their annotated medical subjects and thus supports us in comprehensively assessing the capability of LLMs in the TCM domain. Extensive evaluation of various general LLMs and medical-domain-specific LLMs is conducted. Moreover, we also analyze the robustness of current LLMs in solving TCM QA tasks by introducing randomness. The inconsistency of the experimental results also reveals the shortcomings of current LLMs in solving QA tasks. We also expect that our dataset can further facilitate the development of LLMs in the TCM area.

6/10/2024

cs.CL

📶

CMB: A Comprehensive Medical Benchmark in Chinese

Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB.

4/5/2024

cs.CL cs.AI

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion English tokens, and 100 billion code tokens. This strategic composition facilitates the model's exceptional proficiency in understanding and processing Chinese, a capability further enhanced through alignment techniques. Demonstrating remarkable performance on the CHC-Bench, CT-LLM excels in Chinese language tasks, and showcases its adeptness in English through SFT. This research challenges the prevailing paradigm of training LLMs predominantly on English corpora and then adapting them to other languages, broadening the horizons for LLM training methodologies. By open-sourcing the full process of training a Chinese LLM, including a detailed data processing procedure with the obtained Massive Appropriate Pretraining Chinese Corpus (MAP-CC), a well-chosen multidisciplinary Chinese Hard Case Benchmark (CHC-Bench), and the 2B-size Chinese Tiny LLM (CT-LLM), we aim to foster further exploration and innovation in both academia and industry, paving the way for more inclusive and versatile language models.

4/10/2024

cs.CL cs.AI