Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Read original: arXiv:2405.19631 - Published 5/31/2024 by Akul Goel, Surya Narayanan Hari, Belinda Waltman, Matt Thomson
Total Score

0

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores the use of open-source large language models (LLMs) to encode social determinants of health (SDH) and build an intelligent routing system.
  • The researchers developed a framework that leverages LLMs to extract relevant SDH information from clinical text and route patients to appropriate community resources.
  • The system aims to improve health outcomes by connecting patients with services that address social and environmental factors influencing their well-being.

Plain English Explanation

The paper focuses on using advanced AI language models to better understand and address the social factors that impact people's health. These "social determinants of health" can include things like a person's income, education, living conditions, and access to healthcare.

The researchers created a system that uses powerful language AI models to analyze clinical notes and other text data. This allows the system to identify important social and environmental information about a patient's situation. Based on this, the system can then "route" the patient to relevant community resources and services that can help address the social issues affecting their health.

The goal is to go beyond just treating medical conditions and instead take a more holistic approach to improving people's overall well-being. By connecting patients to the right social services and support, the hope is that this can lead to better long-term health outcomes.

This work builds on efforts to leverage large language models for advancing healthcare and using AI to predict and prevent conditions like Alzheimer's disease. It also relates to research on extracting social determinants of health from medical data and the broader use of large language models in healthcare.

Technical Explanation

The paper presents a framework that leverages open-source LLMs to encode social determinants of health (SDH) and build an intelligent routing system. The key components include:

  1. An SDH extraction module that uses LLMs to identify relevant SDH information from clinical notes and other textual data.
  2. A routing module that matches patients to appropriate community resources and services based on the extracted SDH data.
  3. An evaluation of the system's performance on real-world clinical data, demonstrating its ability to accurately surface SDH insights and recommend relevant interventions.

The researchers experiment with different open-source LLM architectures, including Hippocrates, to optimize the SDH extraction. They also develop novel techniques for integrating the LLM-powered SDH insights into a clinically-relevant routing system, which they call DR.HOUSE.

The results show that the framework can effectively identify SDH factors from clinical text and connect patients to relevant community resources. This suggests that LLMs can be a powerful tool for addressing the social determinants of health and improving holistic patient care.

Critical Analysis

The paper presents a compelling approach to leveraging LLMs for social determinants of health, but it also acknowledges several limitations and areas for further research:

  • The system's performance was evaluated on a relatively small dataset, so more extensive testing is needed to validate its robustness and generalizability.
  • The routing module relies on a curated database of community resources, which may not be comprehensive or up-to-date in all regions.
  • Ethical considerations around data privacy and algorithmic bias in the LLM-powered system are briefly mentioned but warrant deeper exploration.
  • Integrating the framework into real-world clinical workflows and assessing its long-term impact on patient outcomes will be an important next step.

Additionally, while the paper demonstrates the potential of LLMs for SDH encoding, it does not provide a detailed comparison to alternative approaches, such as rule-based or hybrid machine learning models. Further research could explore the relative strengths and weaknesses of different techniques in this domain.

Conclusion

This paper offers a promising framework for leveraging open-source LLMs to address social determinants of health and improve patient care. By extracting relevant SDH insights from clinical data and connecting patients to appropriate community resources, the system has the potential to lead to better health outcomes and more holistic support for individuals.

The research highlights the growing role of advanced language AI in the healthcare domain, building on efforts to advance large language models for medical applications and apply these models to complex health challenges. As the field continues to evolve, further exploration of the ethical and practical considerations will be crucial to ensuring these technologies are developed and deployed responsibly.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router
Total Score

0

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Akul Goel, Surya Narayanan Hari, Belinda Waltman, Matt Thomson

Social Determinants of Health (SDOH) play a significant role in patient health outcomes. The Center of Disease Control (CDC) introduced a subset of ICD-10 codes called Z-codes in an attempt to officially recognize and measure SDOH in the health care system. However, these codes are rarely annotated in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from clinical notes. Previous research has shown that large language models (LLMs) show promise on extracting unstructured data from EHRs. However, with thousands of models to choose from with unique architectures and training sets, it's difficult to choose one model that performs the best on coding tasks. Further, clinical notes contain trusted health information making the use of closed-source language models from commercial vendors difficult, so the identification of open source LLMs that can be run within health organizations and exhibits high performance on SDOH tasks is an urgent problem. Here, we introduce an intelligent routing system for SDOH coding that uses a language model router to direct medical record data to open source LLMs that demonstrate optimal performance on specific SDOH codes. The intelligent routing system exhibits state of the art performance of 97.4% accuracy averaged across 5 codes, including homelessness and food insecurity, on par with closed models such as GPT-4o. In order to train the routing system and validate models, we also introduce a synthetic data generation and validation paradigm to increase the scale of training data without needing privacy protected medical records. Together, we demonstrate an architecture for intelligent routing of inputs to task-optimal language models to achieve high performance across a set of medical coding sub-tasks.

Read more

5/31/2024

Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction
Total Score

0

Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction

Chase Fensore, Rodrigo M. Carrillo-Larco, Shivani A. Patel, Alanna A. Morris, Joyce C. Ho

Social determinants of health (SDOH) $-$ the myriad of circumstances in which people live, grow, and age $-$ play an important role in health outcomes. However, existing outcome prediction models often only use proxies of SDOH as features. Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH, but manually integrating the most relevant data for individual patients becomes increasingly challenging as the volume and diversity of public SDOH data grows. Large language models (LLMs) have shown promise at automatically annotating structured data. Here, we conduct an end-to-end case study evaluating the feasibility of using LLMs to integrate SDOH data, and the utility of these SDOH features for clinical prediction. We first manually label 700+ variables from two publicly-accessible SDOH data sources to one of five semantic SDOH categories. Then, we benchmark performance of 9 open-source LLMs on this classification task. Finally, we train ML models to predict 30-day hospital readmission among 39k heart failure (HF) patients, and we compare the prediction performance of the categorized SDOH variables with standard clinical variables. Additionally, we investigate the impact of few-shot LLM prompting on LLM annotation performance, and perform a metadata ablation study on prompts to evaluate which information helps LLMs accurately annotate these variables. We find that some open-source LLMs can effectively, accurately annotate SDOH variables with zero-shot prompting without the need for fine-tuning. Crucially, when combined with standard clinical features, the LLM-annotated Neighborhood and Built Environment subset of the SDOH variables shows the best performance predicting 30-day readmission of HF patients.

Read more

7/16/2024

💬

Total Score

0

SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy. This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.

Read more

7/25/2024

Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods
Total Score

0

Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods

Yujuan Fu, Giridhar Kaushik Ramachandran, Nicholas J Dobbins, Namu Park, Michael Leu, Abby R. Rosenberg, Kevin Lybarger, Fei Xia, Ozlem Uzuner, Meliha Yetisgen

Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.

Read more

4/5/2024