Predicting postoperative risks using large language models

Read original: arXiv:2402.17493 - Published 9/4/2024 by Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu

Predicting postoperative risks using large language models

Overview

This paper explores the use of large language models (LLMs) in perioperative care, which refers to the medical care provided to patients before, during, and after surgery.
The researchers investigate how to effectively "prescribe" or utilize pretrained LLMs for various clinical tasks in perioperative care, such as predicting patient outcomes and generating personalized treatment plans.
The paper presents a comprehensive study that examines different approaches to fine-tuning and adapting LLMs to improve their performance on perioperative care tasks.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that have been trained on massive amounts of text data, allowing them to understand and generate human-like language. In this research, the authors explore how these LLMs can be used to improve the quality of medical care for patients undergoing surgery.

The key idea is to "prescribe" or apply LLMs to various clinical tasks in perioperative care, which includes the medical care provided before, during, and after surgery. For example, LLMs could be used to predict a patient's risk of complications after surgery or generate personalized treatment plans based on the patient's medical history and surgical procedure.

The researchers investigate different ways to fine-tune and adapt the pretrained LLMs to make them more effective for these specific healthcare applications. This includes techniques like adapting the models to the healthcare domain or using federated learning to train the models across multiple hospitals.

The ultimate goal is to find the "right dose" of LLM technology - the optimal way to apply these powerful language models to improve patient outcomes and the overall quality of perioperative care.

Technical Explanation

The paper presents a comprehensive study on using large language models (LLMs) for various tasks in perioperative care, which includes the medical care provided before, during, and after surgery.

The researchers explore different approaches to fine-tuning and adapting pretrained LLMs to improve their performance on clinical tasks in the perioperative care domain. This includes techniques like:

Domain adaptation: Adapting the LLM to the specific language and concepts used in healthcare through additional pretraining on medical text data.
Federated learning: Training the LLM across multiple hospitals or healthcare institutions to leverage diverse patient data while preserving privacy.
Bias mitigation: Identifying and addressing potential biases in the LLM's predictions that could lead to unfair or inaccurate clinical decision support.

The paper also explores the use of LLMs for various perioperative care tasks, such as predicting patient outcomes, generating personalized treatment plans, and providing clinical decision support. The researchers evaluate the performance of the adapted LLMs on these tasks and compare them to specialized medical language models and traditional machine learning approaches.

Critical Analysis

The paper provides a comprehensive and rigorous investigation into the use of large language models for perioperative care. The researchers acknowledge the potential limitations and challenges of this approach, such as the need to address biases in the LLMs and ensure the models' reliability and interpretability in high-stakes clinical settings.

One potential concern is the generalizability of the findings, as the study is focused on a specific healthcare domain (perioperative care). Further research may be needed to understand how these techniques can be applied to other areas of healthcare or adapt to different patient populations and clinical workflows.

Additionally, the paper does not delve deeply into the ethical implications of using LLMs in clinical decision-making, such as the potential for amplifying existing biases or the challenges of ensuring transparency and accountability in these AI-powered systems. These are important considerations that could be explored in future research.

Conclusion

This paper presents a valuable contribution to the growing body of research on the application of large language models in healthcare. By exploring different approaches to fine-tuning and adapting LLMs for perioperative care tasks, the researchers have demonstrated the potential of these powerful AI systems to improve patient outcomes and the quality of medical care.

The findings from this study can serve as a foundation for further research and development in this area, as healthcare organizations and policymakers continue to grapple with the opportunities and challenges of integrating advanced AI technologies into clinical practice.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Predicting postoperative risks using large language models

Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu

Clinical notes recorded during a patient's perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 pre-operative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care

9/4/2024

🔮

CPLLM: Clinical Prediction with Large Language Models

Ofir Ben Shoham, Nadav Rappoport

We present Clinical Prediction with Large Language Models (CPLLM), a method that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical disease and readmission prediction. We utilized quantization and fine-tuned the LLM using prompts. For diagnosis prediction, we predict whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records. We compared our results to various baselines, including RETAIN, and Med-BERT, the current state-of-the-art model for disease prediction using temporal structured EHR data. In addition, We also evaluated CPLLM for patient hospital readmission prediction and compared our method's performance with benchmark baselines. Our experiments have shown that our proposed method, CPLLM, surpasses all the tested models in terms of PR-AUC and ROC-AUC metrics, showing state-of-the-art results for diagnosis prediction and patient hospital readmission prediction. Such a method can be easily implemented and integrated into the clinical process to help care providers estimate the next steps of patients

5/3/2024

💬

Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

Yinghao Zhu, Junyi Gao, Zixiang Wang, Weibin Liao, Xiaochen Zheng, Lifang Liang, Yasha Wang, Chengwei Pan, Ewen M. Harrison, Liantao Ma

The use of Large Language Models (LLMs) in medicine is growing, but their ability to handle both structured Electronic Health Record (EHR) data and unstructured clinical notes is not well-studied. This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models, for non-generative medical tasks utilizing renowned datasets. We assessed 14 language models (9 GPT-based and 5 BERT-based) and 7 traditional predictive models using the MIMIC dataset (ICU patient records) and the TJH dataset (early COVID-19 EHR data), focusing on tasks such as mortality and readmission prediction, disease hierarchy reconstruction, and biomedical sentence matching, comparing both zero-shot and finetuned performance. Results indicated that LLMs exhibited robust zero-shot predictive capabilities on structured EHR data when using well-designed prompting strategies, frequently surpassing traditional models. However, for unstructured medical texts, LLMs did not outperform finetuned BERT models, which excelled in both supervised and unsupervised tasks. Consequently, while LLMs are effective for zero-shot learning on structured data, finetuned BERT models are more suitable for unstructured texts, underscoring the importance of selecting models based on specific task requirements and data characteristics to optimize the application of NLP technology in healthcare.

7/29/2024

💬

Probabilistic Medical Predictions of Large Language Models

Bowen Gu, Rishi J. Desai, Kueiyu Joshua Lin, Jie Yang

Large Language Models (LLMs) have demonstrated significant potential in clinical applications through prompt engineering, which enables the generation of flexible and diverse clinical predictions. However, they pose challenges in producing prediction probabilities, which are essential for transparency and allowing clinicians to apply flexible probability thresholds in decision-making. While explicit prompt instructions can lead LLMs to provide prediction probability numbers through text generation, LLMs' limitations in numerical reasoning raise concerns about the reliability of these text-generated probabilities. To assess this reliability, we compared explicit probabilities derived from text generation to implicit probabilities calculated based on the likelihood of predicting the correct label token. Experimenting with six advanced open-source LLMs across five medical datasets, we found that the performance of explicit probabilities was consistently lower than implicit probabilities with respect to discrimination, precision, and recall. Moreover, these differences were enlarged on small LLMs and imbalanced datasets, emphasizing the need for cautious interpretation and applications, as well as further research into robust probability estimation methods for LLMs in clinical contexts.

8/22/2024