Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit

2406.04364

Published 6/10/2024 by Isaac YL Lee, Thanh Nguyen-Duc, Ryo Ueno, Jesse Smith, Peter Y Chan

👀

Abstract

Excessive caregiver workload in hospital nurses has been implicated in poorer patient care and increased worker burnout. Measurement of this workload in the Intensive Care Unit (ICU) is often done using the Nursing Activities Score (NAS), but this is usually recorded manually and sporadically. Previous work has made use of Ambient Intelligence (AmI) by using computer vision to passively derive caregiver-patient interaction times to monitor staff workload. In this letter, we propose using a Multiscale Vision Transformer (MViT) to passively predict the NAS from low-resolution thermal videos recorded in an ICU. 458 videos were obtained from an ICU in Melbourne, Australia and used to train a MViTv2 model using an indirect prediction and a direct prediction method. The indirect method predicted 1 of 8 potentially identifiable NAS activities from the video before inferring the NAS. The direct method predicted the NAS score immediately from the video. The indirect method yielded an average 5-fold accuracy of 57.21%, an area under the receiver operating characteristic curve (ROC AUC) of 0.865, a F1 score of 0.570 and a mean squared error (MSE) of 28.16. The direct method yielded a MSE of 18.16. We also showed that the MViTv2 outperforms similar models such as R(2+1)D and ResNet50-LSTM under identical settings. This study shows the feasibility of using a MViTv2 to passively predict the NAS in an ICU and monitor staff workload automatically. Our results above also show an increased accuracy in predicting NAS directly versus predicting NAS indirectly. We hope that our study can provide a direction for future work and further improve the accuracy of passive NAS monitoring.

Create account to get full access

Overview

Excessive workload on hospital nurses can lead to poor patient care and burnout
Nursing Activities Score (NAS) is used to measure nurse workload in Intensive Care Units (ICUs), but is often recorded manually and sporadically
Previous research has used computer vision to passively monitor nurse-patient interactions and staff workload
This paper proposes using a Multiscale Vision Transformer (MViT) to predict NAS from low-resolution thermal videos recorded in an ICU

Plain English Explanation

Nurses in hospitals, especially in the Intensive Care Unit (ICU), often have to work very hard and take care of many patients at once. This heavy workload can lead to poorer care for patients and cause the nurses to feel burnt out. To measure how much work the nurses are doing, a score called the Nursing Activities Score (NAS) is often used in ICUs. But this score is usually recorded manually and not very often.

Previous research has tried to use computer vision, or the ability of computers to analyze visual information, to automatically monitor how much time nurses spend interacting with patients. This can help track the nurses' workload without them having to do extra work.

In this study, the researchers propose using a special type of computer vision model called a Multiscale Vision Transformer (MViT) to predict the NAS score from low-quality thermal videos recorded in an ICU. Thermal videos use heat signatures to capture video rather than regular color images. The researchers trained the MViT model to either predict certain NAS activities first and then calculate the overall NAS score, or to directly predict the NAS score from the video.

The results showed that the MViT model was able to predict the NAS score fairly accurately, and that directly predicting the NAS score was even more accurate than the indirect method. The researchers also showed that the MViT model outperformed some other similar computer vision models in this task.

Overall, this study demonstrates that it is possible to use a Multiscale Vision Transformer to automatically monitor nurse workload in the ICU by predicting the NAS score from thermal videos. This could help hospitals better understand and manage nurse workload without adding extra work for the nurses.

Technical Explanation

The researchers obtained 458 low-resolution thermal videos recorded in an ICU in Melbourne, Australia. They used these videos to train a Multiscale Vision Transformer (MViTv2) model to predict the Nursing Activities Score (NAS), which is a metric used to measure nurse workload in the ICU.

The researchers tested two different approaches: an indirect method and a direct method. In the indirect method, the model first predicted one of 8 potential NAS activities from the video, and then used that to infer the overall NAS score. In the direct method, the model predicted the NAS score directly from the video.

The indirect method yielded an average 5-fold accuracy of 57.21%, an area under the receiver operating characteristic curve (ROC AUC) of 0.865, a F1 score of 0.570, and a mean squared error (MSE) of 28.16. The direct method had a lower MSE of 18.16, indicating better performance.

The researchers also compared the MViTv2 model to other similar computer vision models, such as R(2+1)D and ResNet50-LSTM, and found that the MViTv2 outperformed them under the same experimental settings.

Critical Analysis

The paper demonstrates the feasibility of using a Multiscale Vision Transformer to passively monitor nurse workload in an ICU setting by predicting the Nursing Activities Score from low-resolution thermal videos. The results are promising, with the direct prediction method achieving a reasonably low mean squared error.

However, the paper does not provide much detail on the specific NAS activities that the model was trained to predict, or how well it performed on individual activities. This information would be helpful to understand the model's strengths and weaknesses in capturing different aspects of nurse workload.

Additionally, the paper only evaluates the model's performance on a single dataset from one ICU in Australia. It would be important to test the model on a more diverse set of data to ensure its generalizability to other healthcare settings.

The paper also does not address potential privacy concerns or ethical considerations around the use of thermal cameras and AI-based monitoring of healthcare workers. These are important factors that should be carefully considered before deploying such a system in a real-world clinical setting.

Overall, the research presented in this paper is a valuable step towards automating the measurement of nurse workload and could potentially lead to improvements in patient care and healthcare worker well-being. However, further research is needed to fully understand the limitations and implications of this approach.

Conclusion

This study demonstrates the feasibility of using a Multiscale Vision Transformer (MViT) to passively predict the Nursing Activities Score (NAS) from low-resolution thermal videos recorded in an Intensive Care Unit (ICU). The results show that the MViT model can achieve reasonably accurate NAS predictions, with the direct prediction method outperforming an indirect approach.

This research provides a promising direction for the use of computer vision and transformer-based models to automate the monitoring of nurse workload in healthcare settings. By continuously tracking nurse-patient interactions and workload, hospitals may be able to better understand and manage staffing needs, potentially leading to improved patient outcomes and reduced burnout among healthcare workers.

However, further research is needed to address the limitations of this study, such as evaluating the model's performance on a more diverse dataset and considering the ethical implications of automated workload monitoring. Overall, this paper lays the groundwork for future developments in the use of advanced computer vision techniques to support healthcare professionals and enhance patient care.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Interpretable Vital Sign Forecasting with Model Agnostic Attention Maps

Yuwei Liu, Chen Dan, Anubhav Bhatti, Bingjie Shen, Divij Gupta, Suraj Parmar, San Lee

Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like ICUs. This paper introduces a framework that combines a deep learning model with an attention mechanism that highlights the critical time steps in the forecasting process, thus improving model interpretability and supporting clinical decision-making. We show that the attention mechanism could be adapted to various black box time series forecasting models such as N-HiTS and N-BEATS. Our method preserves the accuracy of conventional deep learning models while enhancing interpretability through attention-weight-generated heatmaps. We evaluated our model on the eICU-CRD dataset, focusing on forecasting vital signs for sepsis patients. We assessed its performance using mean squared error (MSE) and dynamic time warping (DTW) metrics. We explored the attention maps of N-HiTS and N-BEATS, examining the differences in their performance and identifying crucial factors influencing vital sign forecasting.

5/24/2024

cs.LG cs.AI

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Jonathan F. Carter, Jo~ao Jorge, Oliver Gibson, Lionel Tarassenko

Advances in camera-based physiological monitoring have enabled the robust, non-contact measurement of respiration and the cardiac pulse, which are known to be indicative of the sleep stage. This has led to research into camera-based sleep monitoring as a promising alternative to gold-standard polysomnography, which is cumbersome, expensive to administer, and hence unsuitable for longer-term clinical studies. In this paper, we introduce SleepVST, a transformer model which enables state-of-the-art performance in camera-based sleep stage classification (sleep staging). After pre-training on contact sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of 0.75 and 0.77 respectively. We then show that SleepVST can be successfully transferred to cardio-respiratory waveforms extracted from video, enabling fully contact-free sleep staging. Using a video dataset of 50 nights, we achieve a total accuracy of 78.8% and a Cohen's $kappa$ of 0.71 in four-class video-based sleep staging, setting a new state-of-the-art in the domain.

4/8/2024

cs.CV cs.HC

🔮

Visual Acuity Prediction on Real-Life Patient Data Using a Machine Learning Based Multistage System

Tobias Schlosser, Frederik Beuth, Trixy Meyer, Arunodhayan Sampath Kumar, Gabriel Stolze, Olga Furashova, Katrin Engelmann, Danny Kowerko

In ophthalmology, intravitreal operative medication therapy (IVOM) is a widespread treatment for diseases related to the age-related macular degeneration (AMD), the diabetic macular edema (DME), as well as the retinal vein occlusion (RVO). However, in real-world settings, patients often suffer from loss of vision on time scales of years despite therapy, whereas the prediction of the visual acuity (VA) and the earliest possible detection of deterioration under real-life conditions is challenging due to heterogeneous and incomplete data. In this contribution, we present a workflow for the development of a research-compatible data corpus fusing different IT systems of the department of ophthalmology of a German maximum care hospital. The extensive data corpus allows predictive statements of the expected progression of a patient and his or her VA in each of the three diseases. For the disease AMD, we found out a significant deterioration of the visual acuity over time. Within our proposed multistage system, we subsequently classify the VA progression into the three groups of therapy winners, stabilizers, and losers (WSL classification scheme). Our OCT biomarker classification using an ensemble of deep neural networks results in a classification accuracy (F1-score) of over 98 %, enabling us to complete incomplete OCT documentations while allowing us to exploit them for a more precise VA modeling process. Our VA prediction requires at least four VA examinations and optionally OCT biomarkers from the same time period to predict the VA progression within a forecasted time frame, whereas our prediction is currently restricted to IVOM / no therapy. We achieve a final prediction accuracy of 69 % in macro average F1-score, while being in the same range as the ophthalmologists with 57.8 and 50 +- 10.7 % F1-score.

6/11/2024

eess.IV cs.CV cs.IR cs.LG

👀

When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective

Qiqi Zhou, Yichen Zhu

This paper investigates the Neural Tangent Kernel (NTK) to search vision transformers without training. In contrast with the previous observation that NTK-based metrics can effectively predict CNNs performance at initialization, we empirically show their inefficacy in the ViT search space. We hypothesize that the fundamental feature learning preference within ViT contributes to the ineffectiveness of applying NTK to NAS for ViT. We both theoretically and empirically validate that NTK essentially estimates the ability of neural networks that learn low-frequency signals, completely ignoring the impact of high-frequency signals in feature learning. To address this limitation, we propose a new method called ViNTK that generalizes the standard NTK to the high-frequency domain by integrating the Fourier features from inputs. Experiments with multiple ViT search spaces on image classification and semantic segmentation tasks show that our method can significantly speed up search costs over prior state-of-the-art NAS for ViT while maintaining similar performance on searched architectures.

5/9/2024

cs.CV cs.AI cs.LG