Wearable-based behaviour interpolation for semi-supervised human activity recognition

2405.15962

Published 5/28/2024 by Haoran Duan, Shidong Wang, Varun Ojha, Shizheng Wang, Yawen Huang, Yang Long, Rajiv Ranjan, Yefeng Zheng

cs.CV

👁️

Abstract

While traditional feature engineering for Human Activity Recognition (HAR) involves a trial-anderror process, deep learning has emerged as a preferred method for high-level representations of sensor-based human activities. However, most deep learning-based HAR requires a large amount of labelled data and extracting HAR features from unlabelled data for effective deep learning training remains challenging. We, therefore, introduce a deep semi-supervised HAR approach, MixHAR, which concurrently uses labelled and unlabelled activities. Our MixHAR employs a linear interpolation mechanism to blend labelled and unlabelled activities while addressing both inter- and intra-activity variability. A unique challenge identified is the activityintrusion problem during mixing, for which we propose a mixing calibration mechanism to mitigate it in the feature embedding space. Additionally, we rigorously explored and evaluated the five conventional/popular deep semi-supervised technologies on HAR, acting as the benchmark of deep semi-supervised HAR. Our results demonstrate that MixHAR significantly improves performance, underscoring the potential of deep semi-supervised techniques in HAR.

Create account to get full access

Overview

This paper introduces a deep semi-supervised approach called MixHAR for Human Activity Recognition (HAR) using both labeled and unlabeled data.
Traditional feature engineering for HAR involves a trial-and-error process, while deep learning has emerged as a preferred method for high-level representations of sensor-based human activities.
However, most deep learning-based HAR requires a large amount of labeled data, and extracting HAR features from unlabeled data for effective deep learning training remains challenging.

Plain English Explanation

The paper discusses a new method called MixHAR that can improve human activity recognition (HAR) using both labeled and unlabeled data. HAR is the process of identifying the activities a person is performing based on sensor data, such as from wearable devices.

Traditionally, creating the features (characteristics) used for HAR has involved a lot of trial and error. But recently, deep learning has become a popular approach, as it can automatically learn high-level representations of the sensor data. The downside is that deep learning usually requires a large amount of labeled data, where the activities are already identified. Extracting useful features from unlabeled data to train deep learning models is still difficult.

MixHAR aims to address this by using both labeled and unlabeled activity data simultaneously. It has a unique way of blending the labeled and unlabeled data to capture the variations in how people perform activities, while also dealing with the challenge of "activity intrusion" (where data from one activity gets mixed into another). The paper compares MixHAR to other semi-supervised deep learning approaches for HAR and shows that it can significantly improve performance.

Technical Explanation

The paper introduces a deep semi-supervised approach called MixHAR for Human Activity Recognition (HAR) that can leverage both labeled and unlabeled activity data. Traditional feature engineering for HAR involves a trial-and-error process, while deep learning has emerged as a preferred method for learning high-level representations from sensor-based human activity data.

However, most deep learning-based HAR requires a large amount of labeled data, and extracting effective HAR features from unlabeled data for deep learning training remains challenging. To address this, MixHAR employs a linear interpolation mechanism to blend labeled and unlabeled activities, while also addressing both inter- and intra-activity variability.

A key challenge identified is the "activity intrusion" problem, where data from one activity gets mixed into another during the blending process. The authors propose a mixing calibration mechanism to mitigate this issue in the feature embedding space.

Additionally, the paper rigorously explores and evaluates five conventional/popular deep semi-supervised learning techniques on the HAR task, providing a benchmark for deep semi-supervised HAR. The results demonstrate that MixHAR significantly improves performance compared to these other semi-supervised approaches, underscoring the potential of deep semi-supervised techniques for HAR.

Critical Analysis

The paper presents a novel and promising approach to leveraging both labeled and unlabeled data for improved human activity recognition using deep learning. The authors have identified a key challenge in deep learning-based HAR, namely the need for large amounts of labeled data, and have proposed an effective solution in the form of the MixHAR method.

One potential limitation of the research is the specific mixing calibration mechanism used to address the "activity intrusion" problem. While the authors demonstrate its effectiveness, it may be worth exploring alternative approaches or ways to further optimize this component of the method.

Additionally, the paper could have delved deeper into the potential real-world applications and implications of the MixHAR approach. For example, how might this method enable more accessible and widespread deployment of HAR systems, especially in scenarios where labeled data is scarce?

Overall, the research presented in this paper is a significant contribution to the field of human activity recognition, and the MixHAR method shows great promise for improving the performance and accessibility of deep learning-based HAR systems. Readers are encouraged to think critically about the research and consider how it might be further developed and applied in the future.

Conclusion

This paper introduces a deep semi-supervised approach called MixHAR that can effectively leverage both labeled and unlabeled data for improved human activity recognition (HAR) using deep learning. The key innovation is the use of a linear interpolation mechanism to blend labeled and unlabeled activities, while also addressing the challenge of "activity intrusion" through a mixing calibration process.

The results demonstrate that MixHAR significantly outperforms other conventional deep semi-supervised learning techniques for HAR, highlighting the potential of this approach to enable more accessible and effective deep learning-based HAR systems, especially in scenarios where labeled data is scarce. This research represents an important step forward in overcoming the data-hungry nature of deep learning and expanding the applicability of HAR technologies in real-world settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤷

Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition

Si Zuo, Vitor Fortes Rey, Sungho Suh, Stephan Sigg, Paul Lukowicz

Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques.

5/21/2024

eess.SP cs.LG

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.

6/26/2024

eess.SP cs.CV cs.LG

MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition

Stefan Gerd Fritsch, Cennet Oguz, Vitor Fortes Rey, Lala Ray, Maximilian Kiefer-Emmanouilidis, Paul Lukowicz

Human Activity Recognition is a longstanding problem in AI with applications in a broad range of areas: from healthcare, sports and fitness, security, and human computer interaction to robotics. The performance of HAR in real-world settings is strongly dependent on the type and quality of the input signal that can be acquired. Given an unobstructed, high-quality camera view of a scene, computer vision systems, in particular in conjunction with foundational models (e.g., CLIP), can today fairly reliably distinguish complex activities. On the other hand, recognition using modalities such as wearable sensors (which are often more broadly available, e.g, in mobile phones and smartwatches) is a more difficult problem, as the signals often contain less information and labeled training data is more difficult to acquire. In this work, we show how we can improve HAR performance across different modalities using multimodal contrastive pretraining. Our approach MuJo (Multimodal Joint Feature Space Learning), learns a multimodal joint feature space with video, language, pose, and IMU sensor data. The proposed approach combines contrastive and multitask learning methods and analyzes different multitasking strategies for learning a compact shared representation. A large dataset with parallel video, language, pose, and sensor data points is also introduced to support the research, along with an analysis of the robustness of the multimodal joint space for modal-incomplete and low-resource data. On the MM-Fit dataset, our model achieves an impressive Macro F1-Score of up to 0.992 with only 2% of the train data and 0.999 when using all available training data for classification tasks. Moreover, in the scenario where the MM-Fit dataset is unseen, we demonstrate a generalization performance of up to 0.638.

6/7/2024

cs.LG cs.CL cs.CV

Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

Mohammad Belal (Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates), Taimur Hassan (Abu Dhabi University, Abu Dhabi, United Arab Emirates), Abdelfatah Ahmed (Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates), Ahmad Aljarah (Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates), Nael Alsheikh (Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates), Irfan Hussain (Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates)

Human activity recognition (HAR) is a crucial area of research that involves understanding human movements using computer and machine vision technology. Deep learning has emerged as a powerful tool for this task, with models such as Convolutional Neural Networks (CNNs) and Transformers being employed to capture various aspects of human motion. One of the key contributions of this work is the demonstration of the effectiveness of feature fusion in improving HAR accuracy by capturing spatial and temporal features, which has important implications for the development of more accurate and robust activity recognition systems. The study uses sensory data from HuGaDB, PKU-MMD, LARa, and TUG datasets. Two model, the PO-MS-GCN and a Transformer were trained and evaluated, with PO-MS-GCN outperforming state-of-the-art models. HuGaDB and TUG achieved high accuracies and f1-scores, while LARa and PKU-MMD had lower scores. Feature fusion improved results across datasets.

6/26/2024

cs.CV cs.AI