meditron-7b

Maintainer: epfl-llm

204

Last updated 4/29/2024

↗️

Property	Value
Run this model	Run on HuggingFace
API spec	View on HuggingFace
Github link	No Github link provided
Paper link	No paper link provided

Create account to get full access

Model Overview

meditron-7b is a 7 billion parameter model adapted to the medical domain from the Llama-2-7B model. It was developed by the EPFL LLM Team through continued pretraining on a curated medical corpus, including PubMed articles, abstracts, a new dataset of medical guidelines, and general domain data from RedPajama-v1. meditron-7b outperforms Llama-2-7B and PMC-Llama on multiple medical reasoning tasks.

The larger meditron-70b model follows a similar approach, scaling up to 70 billion parameters. It outperforms Llama-2-70B, GPT-3.5 (text-davinci-003), and Flan-PaLM on medical benchmarks.

Model Inputs and Outputs

Inputs

Text-only data: The model takes textual input only, with a context length of up to 2,048 tokens for meditron-7b and 4,096 tokens for meditron-70b.

Outputs

Text generation: The model generates text as output. It is not designed for other output modalities like images or structured data.

Capabilities

The meditron models demonstrate strong performance on a variety of medical reasoning tasks, including medical exam question answering, supporting differential diagnosis, and providing disease information. Their medical domain-specific pretraining allows them to encode and apply relevant medical knowledge more effectively than general language models.

What Can I Use It For?

The meditron models are being made available for further testing and assessment as AI assistants to enhance clinical decision-making and improve access to large language models in healthcare. Potential use cases include:

Medical exam question answering
Supporting differential diagnosis
Providing disease information (symptoms, causes, treatments)
General health information queries

However, the maintainers advise against deploying these models directly in medical applications without extensive testing and alignment with specific use cases, as they have not yet been adapted to deliver medical knowledge appropriately, safely, or within professional constraints.

Things to Try

While it is possible to use the meditron models to generate text, which can be useful for experimentation, the maintainers strongly recommend against using the models directly for production or work that may impact people. Instead, they suggest exploring the use of the models in a more controlled and interactive way, such as by deploying them with a high-throughput and memory-efficient inference engine and a user interface that supports chat and text generation.

The maintainers have provided a deployment guide using the FastChat platform with the vLLM inference engine, and have collected generations for qualitative analysis through the BetterChatGPT interactive UI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

👨‍🏫

meditron-70b

epfl-llm

177

meditron-70b is a 70 billion parameter Large Language Model (LLM) developed by the EPFL LLM Team. It is adapted from the base Llama-2-70B model through continued pretraining on a curated medical corpus, including PubMed articles, abstracts, medical guidelines, and general domain data. This specialized pretraining allows meditron-70b to outperform Llama-2-70B, GPT-3.5, and Flan-PaLM on multiple medical reasoning tasks. Model inputs and outputs meditron-70b is a causal decoder-only transformer language model that takes text-only data as input and generates text as output. The model has a context length of 4,096 tokens. Inputs Text-only data Outputs Generated text Capabilities meditron-70b is designed to encode medical knowledge from high-quality sources. However, the model is not yet adapted to safely deliver this knowledge within professional actionable constraints. Extensive use-case alignment, testing, and validation is recommended before deploying meditron-70b in medical applications. What can I use it for? Potential use cases for meditron-70b may include medical exam question answering and supporting differential diagnosis, though the model should be used with caution. The EPFL LLM Team is making meditron-70b available for further testing and assessment as an AI assistant to enhance clinical decision-making and expand access to LLMs in healthcare. Things to try Researchers and developers are encouraged to experiment with meditron-70b to assess its capabilities and limitations in the medical domain. However, any outputs or applications should be thoroughly reviewed to ensure safety and responsible use of the model.

Updated Invalid Date

Text-to-Text

🎯

BioMedGPT-LM-7B

PharMolix

BioMedGPT-LM-7B is the first large generative language model based on Llama2 that has been fine-tuned on the biomedical domain. It was trained on over 26 billion tokens from millions of biomedical papers in the S2ORC corpus, allowing it to outperform or match human-level performance on several biomedical question-answering benchmarks. This model was developed by PharMolix, and is the language model component of the larger BioMedGPT-10B open-source project. Model inputs and outputs Inputs Text data, primarily focused on biomedical and scientific topics Outputs Generates coherent and informative text in response to prompts, drawing upon its broad knowledge of biomedical concepts and research. Capabilities BioMedGPT-LM-7B can be used for a variety of biomedical natural language processing tasks, such as question answering, summarization, and information extraction from scientific literature. Through its strong performance on benchmarks like PubMedQA, the model has demonstrated its ability to understand and reason about complex biomedical topics. What can I use it for? The BioMedGPT-LM-7B model is well-suited for research and development projects in the biomedical and healthcare domains. Potential use cases include: Powering AI assistants to help clinicians and researchers access relevant biomedical information more efficiently Automating the summarization of scientific papers or clinical notes Enhancing search and retrieval of biomedical literature Generating high-quality text for biomedical education and training materials Things to try One interesting aspect of BioMedGPT-LM-7B is its ability to generate detailed, fact-based responses on a wide range of biomedical topics. Researchers could experiment with prompting the model to explain complex scientific concepts, describe disease mechanisms, or outline treatment guidelines, and observe the model's ability to provide informative and coherent output. Additionally, the model could be evaluated on its capacity to assist with literature reviews, hypothesis generation, and other knowledge-intensive biomedical tasks.

Updated Invalid Date

Text-to-Text

⚙️

Llama3-OpenBioLLM-70B

aaditya

269

Llama3-OpenBioLLM-70B is an advanced open-source biomedical large language model developed by Saama AI Labs. It builds upon the powerful foundations of the Meta-Llama-3-70B-Instruct model, incorporating novel training techniques like Direct Preference Optimization to achieve state-of-the-art performance on a wide range of biomedical tasks. Compared to other open-source models like Meditron-70B and proprietary models like GPT-4, it demonstrates superior results on biomedical benchmarks. Model inputs and outputs Inputs Llama3-OpenBioLLM-70B is a text-to-text model, taking in textual inputs only. Outputs The model generates fluent and coherent text responses, suitable for a variety of natural language processing tasks in the biomedical domain. Capabilities Llama3-OpenBioLLM-70B is designed for specialized performance on biomedical tasks. It excels at understanding and generating domain-specific language, allowing for accurate responses to queries about medical conditions, treatments, and research. The model's advanced training techniques enable it to outperform other open-source and proprietary language models on benchmarks evaluating tasks like medical exam question answering, disease information retrieval, and supporting differential diagnosis. What can I use it for? Llama3-OpenBioLLM-70B is well-suited for a variety of biomedical applications, such as powering virtual assistants to enhance clinical decision-making, providing general health information to the public, and supporting research efforts by automating tasks like literature review and hypothesis generation. Its strong performance on biomedical benchmarks suggests it could be a valuable tool for developers and researchers working in the life sciences and healthcare fields. Things to try Developers can explore using Llama3-OpenBioLLM-70B as a foundation for building custom biomedical natural language processing applications. The model's specialized knowledge and capabilities could be leveraged to create chatbots, question-answering systems, and text generation tools tailored to the needs of the medical and life sciences communities. Additionally, the model's performance could be further fine-tuned on domain-specific datasets to optimize it for specific biomedical use cases.

Updated Invalid Date

Text-to-Text

🎯

Llama3-OpenBioLLM-8B

aaditya

109

Llama3-OpenBioLLM-8B is an advanced open-source language model designed specifically for the biomedical domain. Developed by Saama AI Labs, this model leverages cutting-edge techniques to achieve state-of-the-art performance on a wide range of biomedical tasks. It builds upon the powerful foundations of the Meta-Llama-3-8B model, incorporating the DPO dataset and fine-tuning recipe along with a custom diverse medical instruction dataset. Compared to Llama3-OpenBioLLM-70B, the 8B version has a smaller parameter count but still outperforms other open-source biomedical language models of similar scale. It has also demonstrated better results compared to larger proprietary & open-source models like GPT-3.5 on biomedical benchmarks. Model inputs and outputs Inputs Text data from the biomedical domain, such as research papers, clinical notes, and medical literature. Outputs Generated text responses to biomedical queries, questions, and prompts. Summarization of complex medical information. Extraction of biomedical entities, such as diseases, symptoms, and treatments. Classification of medical documents and data. Capabilities Llama3-OpenBioLLM-8B can efficiently analyze and summarize clinical notes, extract key medical information, answer a wide range of biomedical questions, and perform advanced clinical entity recognition. The model's strong performance on domain-specific tasks, such as Medical Genetics and PubMedQA, highlights its ability to effectively capture and apply biomedical knowledge. What can I use it for? Llama3-OpenBioLLM-8B can be a valuable tool for researchers, clinicians, and developers working in the healthcare and life sciences fields. It can be used to accelerate medical research, improve clinical decision-making, and enhance access to biomedical knowledge. Some potential use cases include: Summarizing complex medical records and literature Answering medical queries and providing information to patients or healthcare professionals Extracting relevant biomedical entities from text Classifying medical documents and data Generating medical reports and content Things to try One interesting aspect of Llama3-OpenBioLLM-8B is its ability to leverage its deep understanding of medical terminology and context to accurately annotate and categorize clinical entities. This capability can support various downstream applications, such as clinical decision support, pharmacovigilance, and medical research. You could try experimenting with the model's entity recognition abilities on your own biomedical text data to see how it performs. Another interesting feature is the model's strong performance on biomedical question-answering tasks, such as PubMedQA. You could try prompting the model with a range of medical questions and see how it responds, paying attention to the level of detail and accuracy in the answers.

Updated Invalid Date

Text-to-Text