Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data

2401.06866

Published 4/30/2024 by Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data

Abstract

Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best performance in 8 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.

Create account to get full access

Overview

This paper explores the use of large language models (LLMs) for health prediction using wearable sensor data.
The authors developed a novel framework called Health-LLM that leverages the power of LLMs to analyze sensor data and make predictions about various health conditions.
The paper presents experiments and insights on the effectiveness of this approach, as well as its potential applications and limitations.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. In this paper, the researchers investigated whether these LLMs could also be used to analyze data from wearable devices, such as fitness trackers and smartwatches, to make predictions about people's health.

The key idea is that the data collected by wearable sensors, like heart rate, steps taken, and sleep patterns, could contain valuable information about a person's overall health and well-being. The researchers developed a framework called Health-LLM that takes this sensor data and uses LLMs to try to predict things like the risk of certain medical conditions or changes in a person's health status.

By leveraging the impressive language understanding capabilities of LLMs, the researchers hoped to find patterns and insights in the sensor data that could lead to better health monitoring and early detection of potential problems. This could be particularly useful for people with chronic health conditions or those who want to proactively manage their well-being.

The paper presents the results of experiments evaluating the performance of Health-LLM and discusses both the promising aspects of this approach as well as some of the challenges and limitations that still need to be addressed.

Technical Explanation

The researchers developed a framework called Health-LLM that uses large language models (LLMs) to analyze data from wearable sensors for health prediction tasks. The key steps in their approach are:

Data Collection: They gathered data from wearable devices, including heart rate, activity levels, sleep patterns, and other relevant health metrics.
Data Preprocessing: The raw sensor data was preprocessed and formatted to be compatible with the LLM model.
LLM Integration: The researchers fine-tuned a pre-trained LLM, such as BERT or GPT-3, on the preprocessed sensor data to enable it to understand and make predictions based on this input.
Health Prediction: The fine-tuned LLM was then used to make predictions about the user's health status, risk of certain medical conditions, or changes in their well-being over time.

The researchers conducted experiments to evaluate the performance of Health-LLM on various health prediction tasks, such as detecting the onset of chronic diseases or monitoring changes in mental health. The results showed that the LLM-based approach outperformed traditional machine learning models in many cases, highlighting the potential of leveraging large language models for health research.

Critical Analysis

The paper presents a compelling approach to using large language models for health prediction from wearable sensor data. The researchers have done a thorough job of designing and evaluating their Health-LLM framework, and the results are promising.

However, the paper also acknowledges some limitations and areas for further research. For example, the performance of Health-LLM may be influenced by factors like the quality and diversity of the training data, and the researchers note the need to carefully address potential biases in the data and model.

Additionally, the paper does not fully explore the interpretability and explainability of the LLM-based predictions, which could be an important consideration for real-world medical applications. Further research is needed to understand the inner workings of the model and ensure that its decision-making process is transparent and trustworthy.

Overall, this paper represents an important step forward in the application of large language models to healthcare, and the researchers have laid the groundwork for exciting future developments in this area.

Conclusion

The Health-LLM framework presented in this paper demonstrates the potential of using large language models to analyze wearable sensor data and make predictions about an individual's health and well-being. By leveraging the powerful language understanding capabilities of LLMs, the researchers were able to achieve promising results in a variety of health prediction tasks.

While the paper highlights the promise of this approach, it also identifies areas for further research and improvement, such as addressing data biases and improving the interpretability of the model's outputs. As the field of healthcare AI continues to evolve, studies like this one will play a crucial role in shaping the future of personalized, data-driven health management.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Towards a Personal Health Large Language Model

Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra, Leor Stern, Yossi Matias, Greg S. Corrado, Shwetak Patel, Shravya Shetty, Jiening Zhan, Shruthi Prabhakara, Daniel McDuff, Cory Y. McLean

In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.

6/11/2024

cs.AI cs.CL

New!PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models

Cathy Mengying Fang, Valdemar Danry, Nathan Whitmore, Andria Bao, Andrew Hutchison, Cayden Pierce, Pattie Maes

We present PhysioLLM, an interactive system that leverages large language models (LLMs) to provide personalized health understanding and exploration by integrating physiological data from wearables with contextual information. Unlike commercial health apps for wearables, our system offers a comprehensive statistical analysis component that discovers correlations and trends in user data, allowing users to ask questions in natural language and receive generated personalized insights, and guides them to develop actionable goals. As a case study, we focus on improving sleep quality, given its measurability through physiological data and its importance to general well-being. Through a user study with 24 Fitbit watch users, we demonstrate that PhysioLLM outperforms both the Fitbit App alone and a generic LLM chatbot in facilitating a deeper, personalized understanding of health data and supporting actionable steps toward personal health goals.

6/28/2024

cs.HC

💬

Evaluating Large Language Models for Public Health Classification and Extraction Tasks

Joshua Harris, Timothy Laurence, Leo Loman, Fan Grayson, Toby Nonnenmacher, Harry Long, Loes WalsGriffith, Amy Douglas, Holly Fountain, Stelios Georgiou, Jo Hardstaff, Kathryn Hopkins, Y-Ling Chi, Galena Kuyumdzhieva, Lesley Larkin, Samuel Collins, Hamish Mohammed, Thomas Finnie, Luke Hounsome, Steven Riley

Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to evaluate LLMs for processing text related to: health burden, epidemiological risk factors, and public health interventions. We initially evaluate five open-weight LLMs (7-70 billion parameters) across all tasks using zero-shot in-context learning. We find that Llama-3-70B-Instruct is the highest performing model, achieving the best results on 15/17 tasks (using micro-F1 scores). We see significant variation across tasks with all open-weight LLMs scoring below 60% micro-F1 on some challenging tasks, such as Contact Classification, while all LLMs achieve greater than 80% micro-F1 on others, such as GI Illness Classification. For a subset of 12 tasks, we also evaluate GPT-4 and find comparable results to Llama-3-70B-Instruct, which scores equally or outperforms GPT-4 on 6 of the 12 tasks. Overall, based on these initial results we find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources, and support public health surveillance, research, and interventions.

5/24/2024

cs.CL cs.LG

💬

A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria

The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development roadmap from traditional Pretrained Language Models (PLMs) to LLMs. Specifically, we first explore the potential of LLMs to enhance the efficiency and effectiveness of various Healthcare applications highlighting both the strengths and limitations. Secondly, we conduct a comparison between the previous PLMs and the latest LLMs, as well as comparing various LLMs with each other. Then we summarize related Healthcare training data, training methods, optimization strategies, and usage. Finally, the unique concerns associated with deploying LLMs in Healthcare settings are investigated, particularly regarding fairness, accountability, transparency and ethics. Our survey provide a comprehensive investigation from perspectives of both computer science and Healthcare specialty. Besides the discussion about Healthcare concerns, we supports the computer science community by compiling a collection of open source resources, such as accessible datasets, the latest methodologies, code implementations, and evaluation benchmarks in the Github. Summarily, we contend that a significant paradigm shift is underway, transitioning from PLMs to LLMs. This shift encompasses a move from discriminative AI approaches to generative AI approaches, as well as a shift from model-centered methodologies to data-centered methodologies. Also, we determine that the biggest obstacle of using LLMs in Healthcare are fairness, accountability, transparency and ethics.

6/12/2024

cs.CL