EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

2407.00242

Published 7/2/2024 by Jo~ao Matos, Jack Gallifant, Jian Pei, A. Ian Wong

Abstract

Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o's with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and 100% in performing binary classification of antibiotics. While EHRmonize significantly enhances efficiency, reducing annotation time by an estimated 60%, we emphasize that clinician oversight remains essential. Our framework, available as a Python package, offers a promising tool to assist clinicians in EHR data abstraction, potentially accelerating healthcare research and improving data harmonization processes.

Create account to get full access

Overview

The paper presents a framework called "EHRmonize" for extracting and abstracting medical concepts from electronic health records (EHRs) using large language models (LLMs).
The framework aims to improve the efficiency and accuracy of medical chart review and clinical documentation by automating the process of identifying and categorizing relevant medical concepts.
The authors demonstrate the effectiveness of their approach on a variety of EHR datasets and compare it to other state-of-the-art methods.

Plain English Explanation

The paper describes a new system called "EHRmonize" that can help doctors and nurses organize and understand the information in patients' electronic medical records more efficiently. Medical records can be long and complex, with lots of technical terms and details. EHRmonize uses powerful language models to automatically identify and summarize the key medical concepts in these records, such as diagnoses, treatments, and test results.

This can save clinicians a lot of time and effort, allowing them to focus on providing the best possible care for their patients. The system has been tested on various datasets of real medical records and shown to outperform other methods for extracting and organizing this information. By making it easier to navigate and understand the wealth of data in electronic health records, EHRmonize has the potential to improve clinical decision-making and patient outcomes.

Technical Explanation

The EHRmonize framework leverages the capabilities of large language models (LLMs) to perform medical concept extraction and abstraction from electronic health records (EHRs). The authors fine-tune a pre-trained LLM on a dataset of medical text, enabling the model to accurately identify and categorize relevant medical concepts, such as diagnoses, treatments, and test results.

The framework consists of several key components:

Concept Extraction: The LLM is used to identify and extract medical concepts from the free-text portions of EHRs, such as clinical notes and discharge summaries.
Concept Normalization: The extracted concepts are mapped to standardized medical terminologies, such as SNOMED-CT, to ensure consistency and interoperability.
Concept Abstraction: The framework groups related concepts into higher-level medical constructs, providing a more concise and meaningful representation of the patient's medical history.

The authors evaluate the performance of EHRmonize on several EHR datasets, comparing it to other state-of-the-art methods for medical concept extraction and abstraction. The results demonstrate the effectiveness of their approach, with EHRmonize achieving superior performance in terms of accuracy, efficiency, and robustness.

Critical Analysis

The authors have presented a promising framework for improving the efficiency and accuracy of medical chart review and clinical documentation. However, there are a few potential limitations and areas for further research that could be considered:

Dataset Bias: The performance of EHRmonize is heavily dependent on the quality and diversity of the training data used to fine-tune the LLM. It is important to ensure that the dataset is representative of the broader patient population and not biased towards certain demographics or healthcare settings.
Generalizability: While the authors have demonstrated the effectiveness of EHRmonize on a range of EHR datasets, it would be valuable to further assess its generalizability to other healthcare systems, clinical specialties, and geographic regions.
Interpretability: As with many LLM-based systems, the inner workings of EHRmonize may not be entirely transparent, making it challenging to understand the reasoning behind the model's decisions. Efforts to improve the interpretability of the system could enhance clinician trust and adoption.
Ethical Considerations: The use of AI-powered systems in healthcare raises important ethical questions, such as data privacy, bias, and the potential for unintended consequences. The authors should address these concerns and outline safeguards to ensure the responsible deployment of EHRmonize.

Overall, the EHRmonize framework represents a significant step forward in leveraging the power of large language models to streamline medical chart review and improve patient care. By continuing to refine and validate the approach, the authors have the opportunity to make a meaningful impact on the field of clinical informatics.

Conclusion

The EHRmonize framework presented in this paper offers a promising solution for automating the extraction and abstraction of medical concepts from electronic health records. By harnessing the capabilities of large language models, the system can help clinicians more efficiently navigate and understand the wealth of information contained in patient medical charts, ultimately leading to improved decision-making and patient outcomes.

While the authors have demonstrated the effectiveness of their approach, there are still some areas for further research and development, such as addressing potential dataset biases, enhancing the interpretability of the system, and carefully considering the ethical implications of deploying AI-powered tools in healthcare. As the field of clinical informatics continues to evolve, frameworks like EHRmonize have the potential to transform the way medical professionals interact with and make use of electronic health data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)

Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yongfeng Zhang, Themistocles L. Assimes, Libby Hemphill, Siyuan Ma

Electronic Health Records (EHRs) play an important role in the healthcare system. However, their complexity and vast volume pose significant challenges to data interpretation and analysis. Recent advancements in Artificial Intelligence (AI), particularly the development of Large Language Models (LLMs), open up new opportunities for researchers in this domain. Although prior studies have demonstrated their potential in language understanding and processing in the context of EHRs, a comprehensive scoping review is lacking. This study aims to bridge this research gap by conducting a scoping review based on 329 related papers collected from OpenAlex. We first performed a bibliometric analysis to examine paper trends, model applications, and collaboration networks. Next, we manually reviewed and categorized each paper into one of the seven identified topics: named entity recognition, information extraction, text similarity, text summarization, text classification, dialogue system, and diagnosis and prediction. For each topic, we discussed the unique capabilities of LLMs, such as their ability to understand context, capture semantic relations, and generate human-like text. Finally, we highlighted several implications for researchers from the perspectives of data resources, prompt engineering, fine-tuning, performance measures, and ethical concerns. In conclusion, this study provides valuable insights into the potential of LLMs to transform EHR research and discusses their applications and ethical considerations.

5/24/2024

cs.ET

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North).

4/11/2024

cs.LG cs.CL

📈

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

Mojdeh Rahmanian, Seyed Mostafa Fakhrahmad, Seyedeh Zahra Mousavi

Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.

4/26/2024

cs.CL

Generalizable and Scalable Multistage Biomedical Concept Normalization Leveraging Large Language Models

Nicholas J Dobbins

Background: Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied. Methods: We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{beta}$, where we favor recall over precision, and F1. Results: We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{beta}$ and F1 in normalization systems +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25). Conclusions: Existing general-purpose LLMs, both propriety and open-source, can be leveraged at scale to greatly improve normalization performance using existing tools, with no fine-tuning.

5/27/2024

cs.CL