Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Read original: arXiv:2406.16886 - Published 6/26/2024 by Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Overview

This research paper focuses on improving human activity recognition (HAR) by augmenting sensor data with skeleton pose information.
The authors propose a novel data augmentation technique that generates synthetic sensor data from skeleton pose sequences.
The augmented data is then used to train deep learning models for HAR, leading to improved performance on benchmark datasets.

Plain English Explanation

Human activity recognition is the process of using sensor data, such as from wearable devices, to automatically identify the activities a person is performing, like walking, running, or eating. This research aims to make these systems more accurate by combining sensor data with information about the person's body position and movements.

The researchers developed a new way to create synthetic sensor data by using the patterns of a person's body movements, captured through a technique called "pose estimation." This allows them to generate additional training data for the activity recognition models, which can help the models learn more robust features and perform better on real-world data.

The key insight is that the way a person's body moves provides important clues about the activity they are performing. By incorporating this information along with the sensor data, the models can learn more comprehensive representations of human activities. This multi-modal approach to activity recognition can lead to significant improvements in accuracy compared to using sensor data alone.

Technical Explanation

The paper proposes a sensor data augmentation method that leverages skeleton pose sequences to generate synthetic sensor data for improving human activity recognition (HAR) models. The authors first use a pre-trained pose estimation model to extract 3D joint positions from video data. They then develop a generative model that learns the mapping between the pose sequences and corresponding sensor data.

This generative model is trained in an unsupervised manner using a variational autoencoder (VAE) architecture. The VAE learns a latent representation that captures the correlation between the pose and sensor data, allowing it to generate plausible synthetic sensor data conditioned on the pose sequences.

The augmented sensor data is then used, along with the original training data, to fine-tune deep learning models for HAR tasks. The authors evaluate their approach on public datasets and show that the sensor data augmentation leads to significant performance improvements, outperforming state-of-the-art methods that use sensor data alone or leverage other data augmentation techniques.

Critical Analysis

The proposed data augmentation method is a novel and promising approach for enhancing human activity recognition systems. By incorporating the rich information contained in skeletal pose data, the authors are able to generate realistic synthetic sensor data that helps the models learn more robust and generalizable representations of human activities.

However, the paper does not explore the limitations of this approach, such as the reliance on accurate pose estimation, which may be challenged in real-world scenarios with occlusions or complex movements. Additionally, the authors do not assess the trade-offs between the computational overhead of the generative model and the performance gains achieved through data augmentation.

Further research could investigate the robustness of the augmented data to sensor noise or missing data, as well as explore the potential of cross-modal learning techniques to better integrate the pose and sensor modalities. Incorporating these considerations could lead to more practical and reliable human activity recognition systems.

Conclusion

This research presents a novel data augmentation technique that leverages skeleton pose information to improve the performance of human activity recognition models. By generating synthetic sensor data conditioned on pose sequences, the authors are able to enhance the diversity and quality of the training data, leading to significant accuracy improvements on benchmark datasets.

The key contribution of this work is the demonstration that the fusion of sensor and pose data can yield substantial benefits for activity recognition, highlighting the potential of multi-modal approaches in this domain. As wearable technology and pose estimation continue to advance, this research paves the way for more robust and widely applicable human activity recognition systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.

6/26/2024

🤷

Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition

Si Zuo, Vitor Fortes Rey, Sungho Suh, Stephan Sigg, Paul Lukowicz

Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques.

5/21/2024

New!Language-centered Human Activity Recognition

Hua Yan, Heng Tan, Yi Ding, Peifei Zhou, Vinod Namboodiri, Yu Yang

Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is critical for applications in healthcare, safety, and industrial production. However, variations in activity patterns, device types, and sensor placements create distribution gaps across datasets, reducing the performance of HAR models. To address this, we propose LanHAR, a novel system that leverages Large Language Models (LLMs) to generate semantic interpretations of sensor readings and activity labels for cross-dataset HAR. This approach not only mitigates cross-dataset heterogeneity but also enhances the recognition of new activities. LanHAR employs an iterative re-generation method to produce high-quality semantic interpretations with LLMs and a two-stage training framework that bridges the semantic interpretations of sensor readings and activity labels. This ultimately leads to a lightweight sensor encoder suitable for mobile deployment, enabling any sensor reading to be mapped into the semantic interpretation space. Experiments on four public datasets demonstrate that our approach significantly outperforms state-of-the-art methods in both cross-dataset HAR and new activity recognition. The source code will be made publicly available.

10/2/2024

Non-stationary BERT: Exploring Augmented IMU Data For Robust Human Activity Recognition

Ning Sun, Yufei Wang, Yuwei Zhang, Jixiang Wan, Shenyue Wang, Ping Liu, Xudong Zhang

Human Activity Recognition (HAR) has gained great attention from researchers due to the popularity of mobile devices and the need to observe users' daily activity data for better human-computer interaction. In this work, we collect a human activity recognition dataset called OPPOHAR consisting of phone IMU data. To facilitate the employment of HAR system in mobile phone and to achieve user-specific activity recognition, we propose a novel light-weight network called Non-stationary BERT with a two-stage training method. We also propose a simple yet effective data augmentation method to explore the deeper relationship between the accelerator and gyroscope data from the IMU. The network achieves the state-of-the-art performance testing on various activity recognition datasets and the data augmentation method demonstrates its wide applicability.

9/26/2024