ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Read original: arXiv:2311.06025 - Published 7/17/2024 by Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang

💬

Overview

There is an increasing demand for superior medical services, which has highlighted discrepancies in the medical infrastructure.
Medical services rely heavily on big data, especially text data, creating a need for effective natural language processing (NLP) solutions tailored to the healthcare domain.
Conventional approaches leveraging pre-trained models have shown promising results, and current large language models (LLMs) offer advanced foundations for medical text processing.
However, most medical LLMs are trained only with supervised fine-tuning (SFT), which efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference.

Plain English Explanation

The paper introduces a new benchmark large language model called ChiMed-GPT that is designed specifically for the Chinese medical domain. This model is trained using a more comprehensive approach that goes beyond just supervised fine-tuning.

The key idea is that while supervised fine-tuning helps language models understand and respond to medical instructions, it is not effective at helping them learn the underlying domain knowledge or align their behavior with human preferences. The researchers propose a training process that includes pre-training, supervised fine-tuning, and reinforcement learning from human feedback (RLHF).

This combined approach allows the ChiMed-GPT model to not only understand medical language, but also develop a deeper understanding of the medical domain and behave in a way that is more aligned with human values and preferences. The researchers evaluate the model on tasks like information extraction, question answering, and dialogue generation, and find that it outperforms general-purpose language models.

Additionally, the researchers analyze potential biases in the model's behavior, such as discrimination against certain patient groups, to contribute to the responsible development of large language models in the medical domain.

Technical Explanation

The paper introduces ChiMed-GPT, a new benchmark large language model (LLM) designed specifically for the Chinese medical domain. The model is trained using a comprehensive approach that includes pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF).

The researchers argue that while SFT can efficiently empower LLMs to understand and respond to medical instructions, it is ineffective in learning domain knowledge and aligning with human preferences. To address this, they propose a training regime that combines pre-training on a large corpus of medical text, SFT on specific medical tasks, and RLHF to align the model's behavior with human values.

Evaluations on tasks such as information extraction, question answering, and dialogue generation demonstrate that ChiMed-GPT outperforms general-domain LLMs. Furthermore, the researchers analyze potential biases in the model's behavior, such as discrimination against certain patient groups, to contribute to the responsible development of LLMs in the medical domain.

The code and model for ChiMed-GPT are made publicly available at https://github.com/synlp/ChiMed-GPT.

Critical Analysis

The paper presents a comprehensive approach to training a large language model for the Chinese medical domain, which is a crucial step in addressing the increasing demand for superior medical services and the discrepancies in the medical infrastructure.

One notable aspect of the research is the inclusion of reinforcement learning from human feedback (RLHF) in the training process. This technique helps align the model's behavior with human values and preferences, which is particularly important in the medical domain, where ethical considerations and patient-centric care are paramount.

However, the paper does not provide detailed information on the specific RLHF techniques used or the dataset and methodology employed for the bias analysis. Additional details in these areas would help readers better understand the strengths and limitations of the approach.

Furthermore, the paper focuses on the Chinese medical domain, which raises questions about the generalizability of the findings to other languages and cultural contexts. It would be valuable to see similar research conducted in other healthcare systems and languages to assess the broader applicability of the proposed methods.

Conclusion

The paper introduces ChiMed-GPT, a new large language model designed specifically for the Chinese medical domain. The model is trained using a comprehensive approach that combines pre-training, supervised fine-tuning, and reinforcement learning from human feedback, which helps the model not only understand medical language but also develop a deeper understanding of the medical domain and align its behavior with human values and preferences.

The evaluation results demonstrate the superior performance of ChiMed-GPT compared to general-domain language models, highlighting the importance of developing domain-specific LLMs for critical applications like healthcare. The researchers' analysis of potential biases in the model's behavior also contributes to the responsible development of large language models in the medical domain.

This research represents an important step towards addressing the discrepancies in medical infrastructure and the increasing demand for superior medical services, particularly in the context of the Chinese healthcare system. The release of the code and model ChiMed-GPT also makes it possible for others to build upon this work and further advance the field of medical natural language processing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang

Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

7/17/2024

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang

Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the above issues, this paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline. In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. Immediately, the full-parameter Supervised Fine-Tuning (SFT) is utilized to incorporate the general medical knowledge schema into the models. After that, we devise a direct following preference optimization to enhance the generation of pediatrician-like humanistic responses. In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery. Extensive results based on the metrics, GPT-4, and doctor evaluations on distinct doctor downstream tasks show that PediatricsGPT consistently outperforms previous Chinese medical LLMs. Our model and dataset will be open-source for community development.

6/4/2024

💬

LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model

Zhi Zhou, Jiang-Xin Shi, Peng-Xiao Song, Xiao-Wen Yang, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li

Large language models (LLMs), including both proprietary and open-source models, have showcased remarkable capabilities in addressing a wide range of downstream tasks. Nonetheless, when it comes to practical Chinese legal tasks, these models fail to meet the actual requirements. Proprietary models do not ensure data privacy for sensitive legal cases, while open-source models demonstrate unsatisfactory performance due to their lack of legal knowledge. To address this problem, we introduce LawGPT, the first open-source model specifically designed for Chinese legal applications. LawGPT comprises two key components: legal-oriented pre-training and legal supervised fine-tuning. Specifically, we employ large-scale Chinese legal documents for legal-oriented pre-training to incorporate legal domain knowledge. To further improve the model's performance on downstream legal tasks, we create a knowledge-driven instruction dataset for legal supervised fine-tuning. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at https://github.com/pengxiao-song/LaWGPT and have received 5.7K stars on GitHub.

6/10/2024

🏋️

New!HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

Junying Chen, Xidong Wang, Ke Ji, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang

Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transform heterogeneous data, from the both pre-training and supervised stages, into a unified, simple input-output pair format. We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine. The developed model, HuatuoGPT-II, has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks, e.g. medical licensing exams. It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine. Expert manual evaluations further validate HuatuoGPT-II's advantages over existing LLMs. Notably, HuatuoGPT-II was benchmarked in a fresh Chinese National Medical Licensing Examination where it achieved the best performance, showcasing not only its effectiveness but also its generalization capabilities.

9/17/2024