Voice EHR: Introducing Multimodal Audio Data for Health

2404.01620

Published 6/4/2024 by James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Hang Nguyen, Phuc Hong, Michael Kleinman, Shelley Ost, Christopher Jackson and 18 others

cs.SD cs.AI cs.CY eess.AS

📊

Abstract

Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.

Create account to get full access

Overview

• Large AI models trained on audio data could rapidly classify patients and improve medical decision-making through early detection.

• Existing technologies rely on limited datasets from high-income, English-speaking countries, which challenges deployment in resource-constrained, high-volume settings.

• This report introduces a novel data type and collection system that captures health data through guided questions using only a mobile/web application, resulting in an audio electronic health record (voice EHR).

• The voice EHR may contain complex biomarkers of health from voice/respiratory features, speech patterns, and language with semantic meaning, compensating for limitations of unimodal clinical datasets.

Plain English Explanation

The research paper discusses the potential of using large AI models trained on audio data to quickly identify and classify medical conditions in patients. This could help healthcare providers make better decisions and potentially lead to earlier detection and treatment, which could improve patient outcomes.

However, the current technologies for collecting and analyzing this kind of audio data have some limitations. They often rely on expensive equipment and are mostly used in high-income, English-speaking countries, making it challenging to deploy them in places with limited resources and high patient volumes, where this kind of technology could have a big impact.

The researchers introduce a new way to collect audio data that doesn't require any special equipment - just a simple mobile app or website. This allows them to gather "voice electronic health records" (voice EHRs) that contain valuable information about a person's health, such as how they speak, their breathing patterns, and the meaning of the words they use. This can provide a more complete picture of someone's health compared to traditional clinical data.

Technical Explanation

The paper presents a novel data type and collection system for capturing health data through guided questions in a mobile/web application. This results in an "audio electronic health record" (voice EHR) that can contain complex biomarkers of health, including voice/respiratory features, speech patterns, and language with semantic meaning.

The researchers describe the development of a consortium of partners for global data collection efforts and the application used to gather the voice EHR data. They demonstrate the potential of this informative voice data to address the limitations of existing unimodal clinical datasets and advance the scalability and diversity of audio AI for medical applications.

Critical Analysis

The paper acknowledges that existing audio-based technologies rely on limited datasets from high-income, English-speaking countries, which can make them difficult to deploy in resource-constrained, high-volume settings. The introduction of a novel data collection system using a mobile/web application is a promising approach to address this challenge.

However, the paper does not provide details on the specific methods used to extract biomarkers from the voice EHR data or the performance of the AI models in classifying medical conditions. Further research and validation would be needed to assess the reliability and accuracy of this approach compared to existing clinical practices.

Additionally, the paper does not discuss potential privacy and ethical concerns related to the collection and use of audio data for medical purposes, which will be an important consideration for widespread deployment and adoption.

Conclusion

This research paper introduces a novel approach to capturing health data through guided questions in a mobile/web application, resulting in an "audio electronic health record" (voice EHR) that may contain valuable biomarkers for medical decision-making. This has the potential to overcome the limitations of existing audio-based technologies, which are often constrained by expensive equipment and limited datasets.

The proposed system could improve the scalability and diversity of audio AI for medical applications, potentially leading to earlier detection and better treatment of various health conditions. However, further research is needed to validate the performance and reliability of this approach, as well as to address potential privacy and ethical concerns.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North).

4/11/2024

cs.LG cs.CL

EHR-Based Mobile and Web Platform for Chronic Disease Risk Prediction Using Large Language Multimodal Models

Chun-Chieh Liao, Wei-Ting Kuo, I-Hsuan Hu, Yen-Chen Shih, Jun-En Ding, Feng Liu, Fang-Ming Hung

Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and developing application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore, we developed an EHR-based chronic disease prediction platform utilizing Large Language Multimodal Models (LLMMs), successfully integrating with frontend web and mobile applications for prediction. This prediction platform can also connect to the hospital's backend database, providing physicians with real-time risk assessment diagnostics. The demonstration link can be found at https://www.youtube.com/watch?v=oqmL9DEDFgA.

6/27/2024

cs.SE cs.AI cs.CL

🛸

Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio samples, targeting disease detection, sound pattern classification, and event identification. Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds, augmented with patient medical records. The resulting multi-modal deep-learning framework addresses interpretability and real-time diagnostic challenges that have hindered previous respiratory-focused models. Benchmark comparisons reveal that Rene significantly outperforms existing models, achieving improvements of 10.27%, 16.15%, 15.29%, and 18.90% in respiratory event detection and audio classification on the SPRSound database. Disease prediction accuracy on the ICBHI database improved by 23% over the baseline in both mean average and harmonic scores. Moreover, we have developed a real-time respiratory sound discrimination system utilizing the Rene architecture. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation(https://github.com/zpforlove/Rene).

6/10/2024

cs.SD cs.AI eess.AS

💬

M3H: Multimodal Multitask Machine Learning for Healthcare

Dimitris Bertsimas, Yu Ma

Developing an integrated many-to-many framework leveraging multimodal data for multiple tasks is crucial to unifying healthcare applications ranging from diagnoses to operations. In resource-constrained hospital environments, a scalable and unified machine learning framework that improves previous forecast performances could improve hospital operations and save costs. We introduce M3H, an explainable Multimodal Multitask Machine Learning for Healthcare framework that consolidates learning from tabular, time-series, language, and vision data for supervised binary/multiclass classification, regression, and unsupervised clustering. It features a novel attention mechanism balancing self-exploitation (learning source-task), and cross-exploration (learning cross-tasks), and offers explainability through a proposed TIM score, shedding light on the dynamics of task learning interdependencies. M3H encompasses an unprecedented range of medical tasks and machine learning problem classes and consistently outperforms traditional single-task models by on average 11.6% across 40 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task. The modular design of the framework ensures its generalizability in data processing, task definition, and rapid model prototyping, making it production ready for both clinical and operational healthcare settings, especially those in constrained environments.

6/11/2024

cs.LG cs.AI