Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction

Read original: arXiv:2407.09688 - Published 7/16/2024 by Chase Fensore, Rodrigo M. Carrillo-Larco, Shivani A. Patel, Alanna A. Morris, Joyce C. Ho

Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction

Overview

This paper explores how large language models (LLMs) can be used to integrate social determinants of health (SDOH) data to improve heart failure readmission prediction.
The researchers trained an LLM on a dataset that combined clinical data and SDOH data, and then used the LLM to make predictions about 30-day hospital readmission for heart failure patients.
The results suggest that incorporating SDOH data through an LLM can lead to more accurate predictions compared to using clinical data alone.

Plain English Explanation

Hospitals often struggle to predict which patients with heart failure are likely to be readmitted within 30 days of being discharged. This research looked at using a special type of AI called a large language model (LLM) to try to improve these predictions.

LLMs are trained on vast amounts of text data and can pick up on complex patterns and relationships. In this case, the researchers trained an LLM on a dataset that included not just medical information about the heart failure patients, but also details about their social and environmental circumstances - things like their income, education level, and access to transportation.

The idea was that by incorporating this "social determinants of health" data, the LLM would be able to make more accurate predictions about which patients were likely to be readmitted. Previous research has shown that social factors can have a big impact on a person's health outcomes.

Sure enough, the results suggested that the LLM-based model was better at predicting 30-day readmissions than a model that only used the clinical data. This implies that incorporating social determinants of health data could help hospitals identify high-risk patients and intervene to prevent unnecessary readmissions.

Technical Explanation

The researchers trained a health LLM on a dataset that combined electronic health record (EHR) data and social determinants of health (SDOH) data for heart failure patients. The EHR data included standard clinical features like lab results and diagnoses, while the SDOH data covered socioeconomic factors like income, education, and transportation access.

They then fine-tuned the pre-trained LLM on this combined dataset and used it to make 30-day readmission predictions for heart failure patients. The LLM-based model was compared to a baseline model that only used the EHR data.

The results showed that the LLM-based model significantly outperformed the baseline in terms of predicting which patients would be readmitted within 30 days. This suggests that the LLM was able to effectively leverage the SDOH information to uncover complex relationships between social factors and health outcomes.

The researchers also conducted a scoping review of prior work on using LLMs for healthcare applications, finding that this was one of the first studies to demonstrate the value of integrating SDOH data through an LLM architecture.

Critical Analysis

While the results are promising, the researchers acknowledge several limitations of the study. First, the dataset was relatively small, coming from a single healthcare system. Larger and more diverse datasets will be needed to validate the findings and ensure the model's generalizability.

Additionally, the paper does not provide much detail on the specific SDOH features that were most predictive of readmission risk. Understanding which social factors are driving the improved predictions could help inform targeted interventions.

It's also worth noting that the use of LLMs in healthcare comes with its own set of challenges, such as interpretability and potential biases in the training data. The researchers should continue to investigate these issues as they explore further applications of this technology.

Conclusion

This paper demonstrates the potential for large language models to enhance healthcare predictions by effectively integrating social determinants of health data. The improved 30-day readmission forecasts for heart failure patients suggest that LLMs can unlock valuable insights from a wide range of data sources.

As the healthcare industry increasingly recognizes the importance of addressing social factors that influence health outcomes, tools like the one developed in this research could become invaluable for identifying high-risk patients and designing more holistic interventions. Further research in this area is warranted and likely to yield important advancements in predictive modeling and personalized care.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction

Chase Fensore, Rodrigo M. Carrillo-Larco, Shivani A. Patel, Alanna A. Morris, Joyce C. Ho

Social determinants of health (SDOH) $-$ the myriad of circumstances in which people live, grow, and age $-$ play an important role in health outcomes. However, existing outcome prediction models often only use proxies of SDOH as features. Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH, but manually integrating the most relevant data for individual patients becomes increasingly challenging as the volume and diversity of public SDOH data grows. Large language models (LLMs) have shown promise at automatically annotating structured data. Here, we conduct an end-to-end case study evaluating the feasibility of using LLMs to integrate SDOH data, and the utility of these SDOH features for clinical prediction. We first manually label 700+ variables from two publicly-accessible SDOH data sources to one of five semantic SDOH categories. Then, we benchmark performance of 9 open-source LLMs on this classification task. Finally, we train ML models to predict 30-day hospital readmission among 39k heart failure (HF) patients, and we compare the prediction performance of the categorized SDOH variables with standard clinical variables. Additionally, we investigate the impact of few-shot LLM prompting on LLM annotation performance, and perform a metadata ablation study on prompts to evaluate which information helps LLMs accurately annotate these variables. We find that some open-source LLMs can effectively, accurately annotate SDOH variables with zero-shot prompting without the need for fine-tuning. Crucially, when combined with standard clinical features, the LLM-annotated Neighborhood and Built Environment subset of the SDOH variables shows the best performance predicting 30-day readmission of HF patients.

7/16/2024

💬

SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy. This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.

7/25/2024

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Akul Goel, Surya Narayanan Hari, Belinda Waltman, Matt Thomson

Social Determinants of Health (SDOH) play a significant role in patient health outcomes. The Center of Disease Control (CDC) introduced a subset of ICD-10 codes called Z-codes in an attempt to officially recognize and measure SDOH in the health care system. However, these codes are rarely annotated in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from clinical notes. Previous research has shown that large language models (LLMs) show promise on extracting unstructured data from EHRs. However, with thousands of models to choose from with unique architectures and training sets, it's difficult to choose one model that performs the best on coding tasks. Further, clinical notes contain trusted health information making the use of closed-source language models from commercial vendors difficult, so the identification of open source LLMs that can be run within health organizations and exhibits high performance on SDOH tasks is an urgent problem. Here, we introduce an intelligent routing system for SDOH coding that uses a language model router to direct medical record data to open source LLMs that demonstrate optimal performance on specific SDOH codes. The intelligent routing system exhibits state of the art performance of 97.4% accuracy averaged across 5 codes, including homelessness and food insecurity, on par with closed models such as GPT-4o. In order to train the routing system and validate models, we also introduce a synthetic data generation and validation paradigm to increase the scale of training data without needing privacy protected medical records. Together, we demonstrate an architecture for intelligent routing of inputs to task-optimal language models to achieve high performance across a set of medical coding sub-tasks.

5/31/2024

Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods

Yujuan Fu, Giridhar Kaushik Ramachandran, Nicholas J Dobbins, Namu Park, Michael Leu, Abby R. Rosenberg, Kevin Lybarger, Fei Xia, Ozlem Uzuner, Meliha Yetisgen

Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.

4/5/2024