A generative foundation model for five-class sleep staging with arbitrary sensor input

Read original: arXiv:2408.15253 - Published 8/29/2024 by Hans van Gorp, Merel M. van Gilst, Pedro Fonseca, Fokke B. van Meulen, Johannes P. van Dijk, Sebastiaan Overeem, Ruud J. G. van Sloun

📈

Overview

This paper proposes a deep generative foundation model for fully automatic sleep staging from a variety of sensor signals.
The model is trained on a large dataset of overnight sleep recordings with 36 different signals, including neurological, cardiac, and respiratory data.
The model can perform "zero-shot" inference, meaning it can be used with any combination of sensor signals without retraining.
The model achieves performance on par with human experts using only single-channel EEG, and can also leverage other sensor modalities to maintain accuracy when some signals are missing.
The paper introduces a novel interpretability metric to understand the information gain contributed by each sensor.
The foundation model can be easily extended to incorporate new sensor modalities by training a score estimator on the new input.

Plain English Explanation

The paper describes a new deep learning model that can automatically determine sleep stages from a variety of sensor signals, such as brain waves (EEG), eye movements (EOG), muscle activity (EMG), heart rate, and breathing.

Traditionally, sleep staging has been done by human experts analyzing a subset of these signals, called polysomnography (PSG). However, the researchers argue that using a wider range of signals could make sleep staging more reliable, resilient to missing data, and applicable to long-term home recordings.

Their model is a deep generative "foundation" model, meaning it can be used with any combination of sensor signals without needing to be retrained. It was trained on a large dataset of overnight sleep recordings with 36 different signals.

The model can match the performance of human experts using just a single EEG channel. But it also has the flexibility to leverage other sensor modalities, such as those typically used in home sleep recordings (e.g. finger PPG, nasal airflow, chest movement). This helps maintain accuracy even when some signals are missing.

The paper also introduces a novel way to interpret the model's decision-making by quantifying the "information gain" provided by each sensor. This helps understand which signals are most important for sleep staging.

Lastly, the foundation model design allows new sensor modalities to be easily incorporated by training a small "score estimator" module on the new input, without needing to retrain the entire model.

Technical Explanation

The researchers trained a score-based diffusion model with a transformer backbone on a dataset of 1,947 expert-labeled overnight sleep recordings. The dataset contained 36 different signal derivations, including EEG, EOG, EMG, as well as cardiac and respiratory signals.

The key innovation is a novel Bayesian factorization of the score function across the sensor modalities. This allows the model to perform "zero-shot" inference on any combination of sensors, without needing to retrain on specific signal sets.

On single-channel EEG, the model reaches the performance limit in terms of matching human expert agreement, with a 5-class accuracy of 85.6% and a kappa score of 0.791. When using a sensor set typical of home sleep recordings (finger PPG, nasal airflow, chest belt), the accuracy is 79.0% with a kappa of 0.697. Even when combining more unusual signals like tibialis and sternocleidomastoid EMG, the model still achieves 71.0% accuracy and 0.575 kappa.

The researchers also introduce a novel interpretability metric called "information gain per sensor". This quantifies how much each sensor contributes to the model's sleep stage predictions. Interestingly, they find this metric is linearly correlated with the classification performance.

Lastly, the generative foundation model design allows for easy incorporation of new sensor modalities. This is done by training a small "score estimator" module on the new input, without needing to retrain the entire model.

Critical Analysis

The paper makes a strong case for the benefits of leveraging a wide range of sensor signals for automated sleep staging. The model's flexibility to handle any combination of inputs, while maintaining strong performance, is a key innovation.

However, the paper does not address potential limitations or practical challenges in deploying such a system in real-world settings. For example, the installation and maintenance of multiple sensors for home sleep monitoring may be burdensome for users. There are also open questions about the robustness of the model to noisy or corrupted sensor data in uncontrolled environments.

Additionally, while the interpretability metric provides insights into which sensors are most informative, the paper does not explore whether this knowledge can be used to optimize sensor selection or hardware design. Further research into the relationship between sensor set, model performance, and practical deployment considerations would be valuable.

Lastly, the paper focuses on overall sleep staging accuracy, but does not examine the model's performance on specific sleep stages. Certain stages may be more clinically relevant, and the model's reliability in detecting those stages should be evaluated.

Overall, this work represents an important step forward in automated sleep analysis, but continued research is needed to translate these technical advances into practical, user-friendly solutions for sleep monitoring and assessment.

Conclusion

This paper presents a novel deep generative foundation model for fully automated sleep staging that can leverage a wide variety of sensor signals. The model achieves expert-level performance using just single-channel EEG, while also offering the flexibility to maintain accuracy when using alternative sensor modalities.

The key innovations include a Bayesian factorization approach for "zero-shot" inference on any sensor combination, and a novel interpretability metric that reveals the information gain contributed by each signal. The foundation model design also enables easy incorporation of new sensor types.

These advances have the potential to improve the reliability, resilience, and accessibility of sleep monitoring, especially for long-term home-based applications. However, further research is needed to address practical deployment challenges and optimize the model's performance on clinically relevant sleep stages.

Overall, this work represents an important step towards more comprehensive, flexible, and interpretable automated sleep analysis, with implications for sleep research, clinical diagnosis, and consumer health applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

A generative foundation model for five-class sleep staging with arbitrary sensor input

Hans van Gorp, Merel M. van Gilst, Pedro Fonseca, Fokke B. van Meulen, Johannes P. van Dijk, Sebastiaan Overeem, Ruud J. G. van Sloun

Gold-standard sleep scoring as performed by human technicians is based on a subset of PSG signals, namely the EEG, EOG, and EMG. The PSG, however, consists of many more signal derivations that could potentially be used to perform sleep staging, including cardiac and respiratory modalities. Leveraging this variety in signals would offer advantages, for example by increasing reliability, resilience to signal loss, and application to long-term non-obtrusive recordings. This paper proposes a deep generative foundation model for fully automatic sleep staging from a plurality of sensors and any combination thereof. We trained a score-based diffusion model with a transformer backbone using a dataset of 1947 expert-labeled overnight sleep recordings with 36 different signals, including neurological, cardiac, and respiratory signals. We achieve zero-shot inference on any sensor set by using a novel Bayesian factorization of the score function across the sensors, i.e., it does not require retraining on specific combinations of signals. On single-channel EEG, our method reaches the performance limit in terms of PSG inter-rater agreement (5-class accuracy 85.6%, kappa 0.791). At the same time, the method offers full flexibility to use any sensor set derived from other modalities, for example, as typically used in home recordings that include finger PPG, nasal cannula and thoracic belt (5-class accuracy 79.0%, kappa of 0.697), or by combining derivations not typically used for sleep staging such as the tibialis and sternocleidomastoid EMG (5-class accuracy 71.0%, kappa of 0.575). Additionally, we propose a novel interpretability metric in terms of information gain per sensor and show that this is linearly correlated with classification performance. Lastly, our foundation model allows for post-hoc addition of entirely new sensor modalities by merely training a score estimator on the novel input.

8/29/2024

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Jonathan F. Carter, Jo~ao Jorge, Oliver Gibson, Lionel Tarassenko

Advances in camera-based physiological monitoring have enabled the robust, non-contact measurement of respiration and the cardiac pulse, which are known to be indicative of the sleep stage. This has led to research into camera-based sleep monitoring as a promising alternative to gold-standard polysomnography, which is cumbersome, expensive to administer, and hence unsuitable for longer-term clinical studies. In this paper, we introduce SleepVST, a transformer model which enables state-of-the-art performance in camera-based sleep stage classification (sleep staging). After pre-training on contact sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of 0.75 and 0.77 respectively. We then show that SleepVST can be successfully transferred to cardio-respiratory waveforms extracted from video, enabling fully contact-free sleep staging. Using a video dataset of 50 nights, we achieve a total accuracy of 78.8% and a Cohen's $kappa$ of 0.71 in four-class video-based sleep staging, setting a new state-of-the-art in the domain.

4/8/2024

S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models

Tiezhi Wang, Nils Strodthoff

Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these design choices within the broad category of encoder-predictor architectures. We identify robust architectures applicable to both time series and spectrogram input representations. These architectures incorporate structured state space models as integral components and achieve statistically significant performance improvements compared to state-of-the-art approaches on the extensive Sleep Heart Health Study dataset. We anticipate that the architectural insights gained from this study along with the refined methodology for architecture search demonstrated herein will not only prove valuable for future research in sleep staging but also hold relevance for other time series annotation tasks.

8/22/2024

SleepPPG-Net2: Deep learning generalization for sleep staging from photoplethysmography

Shirel Attia, Revital Shani Hershkovich, Alissa Tabakhov, Angeleene Ang, Sharon Haimov, Riva Tauman, Joachim A. Behar

Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series.

4/11/2024