HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

Read original: arXiv:2311.09774 - Published 9/17/2024 by Junying Chen, Xidong Wang, Ke Ji, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong and 4 others

🏋️

Overview

Language models like LLaMA2 may not contain specialized knowledge needed for certain domains, like medicine
Adapting a language model to a specific domain, known as "domain adaptation," can be challenging due to differences in data used for pre-training and fine-tuning
This paper proposes a new protocol to transform heterogeneous data into a unified format to simplify the domain adaptation process
The new model, HuatuoGPT-II, achieves state-of-the-art performance on Chinese medicine benchmarks, outperforming even proprietary models like ChatGPT and GPT-4

Plain English Explanation

Adapting Language Models for Specialized Domains

Language models like LLaMA2 are trained on a vast amount of general text data, but may lack the specialized knowledge needed for certain domains, such as traditional Chinese medicine. To address this, researchers often "adapt" the language model to the specific domain through additional training.

However, this domain adaptation process can be challenging because the data used for the initial pre-training and the later fine-tuning often differ greatly in terms of language, genre, or format. To simplify this, the researchers propose a new approach that transforms all the data, from both the pre-training and fine-tuning stages, into a standardized input-output format.

By using this unified data format, the researchers were able to develop a new model called HuatuoGPT-II that achieves state-of-the-art performance on Chinese medicine benchmarks. In fact, HuatuoGPT-II even outperforms proprietary models like ChatGPT and GPT-4, especially in the domain of traditional Chinese medicine.

Technical Explanation

The key innovation in this work is the proposed "one-stage training" protocol that simplifies the domain adaptation process. Instead of the traditional two-stage approach of pre-training on general data and then fine-tuning on domain-specific data, the researchers transform all the data, from both stages, into a unified input-output format.

This unified format allows the model to be trained end-to-end on the transformed data, eliminating the need for a separate fine-tuning stage. The researchers validate this approach in the domain of traditional Chinese medicine, where proprietary language models like ChatGPT and GPT-4 have been shown to perform relatively poorly.

The resulting model, HuatuoGPT-II, achieves state-of-the-art performance on a number of Chinese medicine benchmarks, including medical licensing exams. Notably, HuatuoGPT-II even outperforms the proprietary models in certain aspects, particularly in the domain of traditional Chinese medicine.

Critical Analysis

The researchers acknowledge that their proposed "one-stage training" protocol relies on the ability to transform heterogeneous data into a unified format. While they demonstrate the effectiveness of this approach in the Chinese medicine domain, it remains to be seen how well it would generalize to other specialized domains with different data characteristics.

Additionally, the paper does not provide detailed information about the specific data sources, preprocessing steps, or model architectures used. This makes it difficult to fully assess the reproducibility and broader applicability of the proposed methods.

Furthermore, the researchers mention that HuatuoGPT-II was benchmarked on a "fresh" Chinese National Medical Licensing Examination, but it is unclear what this means in terms of the data used for evaluation. More details on the evaluation methodology and datasets would be helpful to understand the true generalization capabilities of the model.

Conclusion

This research presents a novel approach to domain adaptation for language models, focusing on transforming heterogeneous data into a unified format to simplify the learning process. The resulting model, HuatuoGPT-II, has demonstrated state-of-the-art performance on Chinese medicine benchmarks, even outperforming proprietary models like ChatGPT and GPT-4 in certain aspects.

While the proposed protocol shows promise, further research is needed to assess its broader applicability and to address potential limitations. Nonetheless, this work highlights the importance of developing specialized language models for domains where general-purpose models may fall short, and the potential benefits of innovative approaches to domain adaptation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏋️

New!HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

Junying Chen, Xidong Wang, Ke Ji, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang

Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transform heterogeneous data, from the both pre-training and supervised stages, into a unified, simple input-output pair format. We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine. The developed model, HuatuoGPT-II, has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks, e.g. medical licensing exams. It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine. Expert manual evaluations further validate HuatuoGPT-II's advantages over existing LLMs. Notably, HuatuoGPT-II was benchmarked in a fresh Chinese National Medical Licensing Examination where it achieved the best performance, showcasing not only its effectiveness but also its generalization capabilities.

9/17/2024

💬

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang

Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

7/17/2024

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang

Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the above issues, this paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline. In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. Immediately, the full-parameter Supervised Fine-Tuning (SFT) is utilized to incorporate the general medical knowledge schema into the models. After that, we devise a direct following preference optimization to enhance the generation of pediatrician-like humanistic responses. In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery. Extensive results based on the metrics, GPT-4, and doctor evaluations on distinct doctor downstream tasks show that PediatricsGPT consistently outperforms previous Chinese medical LLMs. Our model and dataset will be open-source for community development.

6/4/2024

Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise: A Case Study on Chinese Legal Domain

Zhen wan, Yating Zhang, Yexiang Wang, Fei Cheng, Sadao Kurohashi

While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-domain knowledge. A pressing challenge is that it's not plausible to continue training LLMs of such scale on in-domain data. This paper introduces a simple and effective domain adaptation framework for GPT-4 by reformulating generation as an textbf{adapt-retrieve-revise} process. The initial step is to textbf{adapt} an affordable 7B LLM to the target domain by continuing learning on in-domain data. When solving a task, we leverage the adapted LLM to generate a draft answer given a task query. Then, the draft answer will be used to textbf{retrieve} supporting evidence candidates from an external in-domain knowledge base. Finally, the draft answer and retrieved evidence are concatenated into a whole prompt to let GPT-4 assess the evidence and textbf{revise} the draft answer to generate the final answer. Our proposal combines the advantages of the efficiency of adapting a smaller 7B model with the evidence-assessing capability of GPT-4 and effectively prevents GPT-4 from generating hallucinatory content. In the zero-shot setting of four Chinese legal tasks, our method improves accuracy by 33.3% compared to the direct generation by GPT-4. When compared to two stronger retrieval-based baselines, our method outperforms them by 15.4% and 23.9%. Our code will be released

8/27/2024