Transforming Wearable Data into Health Insights using Large Language Model Agents

Read original: arXiv:2406.06464 - Published 6/12/2024 by Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor and 10 others

📊

Overview

Despite the widespread use of wearable health trackers, deriving personalized insights from the data remains a challenge.
The rise of large language model (LLM) agents, which can interact with the world, presents an opportunity to enable personalized health analysis at scale.
This paper introduces the Personal Health Insights Agent (PHIA), a system that uses code generation and information retrieval tools to analyze and interpret behavioral health data from wearables.

Plain English Explanation

Wearable devices like fitness trackers and smartwatches are becoming increasingly common. These devices can monitor things like sleep, exercise, and other health-related behaviors. However, making sense of all this data and turning it into personalized insights that can actually help people improve their health is not an easy task. It requires complex and open-ended analysis of the data.

The researchers behind this paper saw an opportunity in the recent development of large language model (LLM) agents. These are AI systems that can understand and interact with the world in very sophisticated ways. The researchers wondered if these LLM agents could be used to analyze personal health data from wearables and provide personalized insights to users.

To test this idea, the researchers created a system called the Personal Health Insights Agent (PHIA). PHIA uses advanced tools for generating code and retrieving information to analyze data from wearable devices and provide users with personalized health insights. The researchers also created two datasets of over 4000 health-related questions to test how well PHIA could answer both factual and open-ended questions about personal health.

Technical Explanation

The researchers developed the Personal Health Insights Agent (PHIA), a system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearable devices.

To evaluate PHIA's performance, the researchers curated two benchmark question-answering datasets. The first dataset contained over 4000 factual numerical health questions, while the second dataset included over 4000 open-ended health insights questions gathered from crowdsourcing.

Through 650 hours of human and expert evaluation, the researchers found that PHIA could accurately address over 84% of the factual numerical questions and more than 83% of the open-ended, crowd-sourced questions.

Critical Analysis

The paper presents a promising approach to leveraging large language models for personalized health analysis. However, the researchers acknowledge several caveats and areas for further research.

One limitation is that the evaluation datasets were curated and may not fully reflect the real-world complexity and diversity of health-related questions that users might have. Additionally, the paper does not provide a detailed analysis of PHIA's strengths and weaknesses across different types of questions or health domains.

Further research is needed to understand how PHIA's performance might vary with different wearable data sources, user demographics, and health conditions. The researchers also note the importance of addressing privacy and ethical considerations when deploying such a system in real-world settings.

Conclusion

This research demonstrates the potential of using large language model agents, like the Personal Health Insights Agent (PHIA), to analyze personal health data from wearables and provide users with personalized insights. By achieving high accuracy on both factual and open-ended health questions, PHIA shows promise in supporting physical activity and behavior change through data-driven insights.

The findings of this paper have implications for advancing behavioral health across the population and paving the way for more accessible, personalized wellness regimens informed by data-driven insights. Further research and development in this area could lead to transformative improvements in how individuals interpret and act on their own health data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Transforming Wearable Data into Health Insights using Large Language Model Agents

Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu

Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights.

6/12/2024

PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models

Cathy Mengying Fang, Valdemar Danry, Nathan Whitmore, Andria Bao, Andrew Hutchison, Cayden Pierce, Pattie Maes

We present PhysioLLM, an interactive system that leverages large language models (LLMs) to provide personalized health understanding and exploration by integrating physiological data from wearables with contextual information. Unlike commercial health apps for wearables, our system offers a comprehensive statistical analysis component that discovers correlations and trends in user data, allowing users to ask questions in natural language and receive generated personalized insights, and guides them to develop actionable goals. As a case study, we focus on improving sleep quality, given its measurability through physiological data and its importance to general well-being. Through a user study with 24 Fitbit watch users, we demonstrate that PhysioLLM outperforms both the Fitbit App alone and a generic LLM chatbot in facilitating a deeper, personalized understanding of health data and supporting actionable steps toward personal health goals.

6/28/2024

Towards a Personal Health Large Language Model

Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra, Leor Stern, Yossi Matias, Greg S. Corrado, Shwetak Patel, Shravya Shetty, Jiening Zhan, Shruthi Prabhakara, Daniel McDuff, Cory Y. McLean

In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.

6/11/2024

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data

Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park

Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best performance in 8 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.

4/30/2024