Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition

Read original: arXiv:2307.03638 - Published 5/21/2024 by Jianyuan Ni, Hao Tang, Anne H. H. Ngu, Gaowen Liu, Yan Yan

🌐

Overview

Wearable sensor-based Human Action Recognition (HAR) is a growing field, but its accuracy still lags behind visual modalities like RGB video and depth data.
While multimodal approaches can improve HAR accuracy, wearable devices are limited to non-visual inputs like accelerometers and gyroscopes.
To address this, the researchers propose a novel Physical-aware Cross-modal Adversarial (PCA) framework that uses only accelerometer data from four inertial sensors.

Plain English Explanation

The paper discusses a new way to recognize human actions using data from wearable sensors like smartwatches or fitness trackers. Recognizing human actions, or HAR, is an important task with applications in areas like healthcare, gaming, and robotics.

Currently, the most accurate HAR systems use visual data like video or depth cameras. But wearable devices can only capture non-visual sensor data, like the readings from accelerometers that measure movement. This limits the performance of wearable-based HAR compared to vision-based systems.

To overcome this, the researchers developed a novel PCA framework that takes only accelerometer data from four sensors on the body. It uses this data to generate a synthetic 3D skeleton of the person's body movements. This synthetic skeleton is then combined with the original accelerometer data to improve the accuracy of action recognition.

By generating a virtual skeleton from the sensor data, the framework can capture more detailed information about the person's movements. This allows wearable-based HAR to perform better, even without the benefit of visual inputs like cameras. The researchers tested their approach on several public datasets and found it achieved competitive performance compared to previous methods.

Technical Explanation

The key innovation of the PCA framework is its ability to generate a synthetic 3D skeleton sequence from the accelerometer data alone. The researchers propose an "IMU2SKELETON" network that learns to map the time-series accelerometer inputs to corresponding 3D skeleton joint coordinates.

To improve the quality of the synthetic skeletons, the researchers impose additional physical constraints. Specifically, they note that accelerometer data can be considered the second derivative of the skeleton joint positions over time. By enforcing this physical relationship, the synthetic skeletons better reflect the underlying skeletal dynamics.

The final PCA framework fuses the original accelerometer data with the constrained synthetic skeleton sequence to perform the HAR classification task. This multimodal fusion allows the system to leverage both the direct sensor readings and the inferred skeletal dynamics for improved action recognition accuracy.

The researchers evaluated their PCA framework on three publicly available HAR datasets: Berkeley-MHAD, UTD-MHAD, and MMAct. The results demonstrate that PCA can achieve competitive performance compared to previous mono-modal and multimodal HAR methods, despite only using wearable accelerometer data as input.

Critical Analysis

One limitation of the PCA framework is its reliance on having four inertial sensors placed at specific body locations. This may limit its practical deployment, as users would need to wear a custom sensor array rather than using a single wearable device.

Additionally, the physical constraints imposed on the synthetic skeletons, while grounded in theory, may not perfectly capture the complexities of real human skeletal dynamics. Further research could explore alternative ways of regularizing the skeleton generation process.

It would also be valuable to understand how the PCA framework's performance scales with the number and placement of inertial sensors. Exploring sparse sensor configurations or online sensor selection algorithms could make the approach more practical for real-world wearable HAR applications.

Conclusion

The PCA framework proposed in this paper represents an innovative approach to improving the accuracy of wearable sensor-based Human Action Recognition. By generating synthetic skeletal data from accelerometer inputs and fusing it with the original sensor data, the framework can capture richer information about human movements compared to using accelerometers alone.

This research demonstrates the potential of combining physical modeling and multimodal fusion techniques to overcome the limitations of single-sensor wearable devices. As wearable technology continues to advance, approaches like PCA may help bridge the performance gap between vision-based and sensor-based HAR, enabling a wider range of real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition

Jianyuan Ni, Hao Tang, Anne H. H. Ngu, Gaowen Liu, Yan Yan

Wearable sensor-based Human Action Recognition (HAR) has made significant strides in recent times. However, the accuracy performance of wearable sensor-based HAR is currently still lagging behind that of visual modalities-based systems, such as RGB video and depth data. Although diverse input modalities can provide complementary cues and improve the accuracy performance of HAR, wearable devices can only capture limited kinds of non-visual time series input, such as accelerometers and gyroscopes. This limitation hinders the deployment of multimodal simultaneously using visual and non-visual modality data in parallel on current wearable devices. To address this issue, we propose a novel Physical-aware Cross-modal Adversarial (PCA) framework that utilizes only time-series accelerometer data from four inertial sensors for the wearable sensor-based HAR problem. Specifically, we propose an effective IMU2SKELETON network to produce corresponding synthetic skeleton joints from accelerometer data. Subsequently, we imposed additional constraints on the synthetic skeleton data from a physical perspective, as accelerometer data can be regarded as the second derivative of the skeleton sequence coordinates. After that, the original accelerometer as well as the constrained skeleton sequence were fused together to make the final classification. In this way, when individuals wear wearable devices, the devices can not only capture accelerometer data, but can also generate synthetic skeleton sequences for real-time wearable sensor-based HAR applications that need to be conducted anytime and anywhere. To demonstrate the effectiveness of our proposed PCA framework, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PCA approach has competitive performance compared to the previous methods on the mono sensor-based HAR classification problem.

5/21/2024

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.

6/26/2024

ALS-HAR: Harnessing Wearable Ambient Light Sensors to Enhance IMU-based HAR

Lala Shakti Swarup Ray, Daniel Gei{ss}ler, Mengxi Liu, Bo Zhou, Sungho Suh, Paul Lukowicz

Despite the widespread integration of ambient light sensors (ALS) in smart devices commonly used for screen brightness adaptation, their application in human activity recognition (HAR), primarily through body-worn ALS, is largely unexplored. In this work, we developed ALS-HAR, a robust wearable light-based motion activity classifier. Although ALS-HAR achieves comparable accuracy to other modalities, its natural sensitivity to external disturbances, such as changes in ambient light, weather conditions, or indoor lighting, makes it challenging for daily use. To address such drawbacks, we introduce strategies to enhance environment-invariant IMU-based activity classifications through augmented multi-modal and contrastive classifications by transferring the knowledge extracted from the ALS. Our experiments on a real-world activity dataset for three different scenarios demonstrate that while ALS-HAR's accuracy strongly relies on external lighting conditions, cross-modal information can still improve other HAR systems, such as IMU-based classifiers.Even in scenarios where ALS performs insufficiently, the additional knowledge enables improved accuracy and macro F1 score by up to 4.2 % and 6.4 %, respectively, for IMU-based classifiers and even surpasses multi-modal sensor fusion models in two of our three experiment scenarios. Our research highlights the untapped potential of ALS integration in advancing sensor-based HAR technology, paving the way for practical and efficient wearable ALS-based activity recognition systems with potential applications in healthcare, sports monitoring, and smart indoor environments.

8/23/2024

Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition

Abhi Kamboj, Anh Duy Nguyen, Minh Do

Despite living in a multi-sensory world, most AI models are limited to textual and visual interpretations of human motion and behavior. Inertial measurement units (IMUs) provide a salient signal to understand human motion; however, they are challenging to use due to their uninterpretability and scarcity of their data. We investigate a method to transfer knowledge between visual and inertial modalities using the structure of an informative joint representation space designed for human action recognition (HAR). We apply the resulting Fusion and Cross-modal Transfer (FACT) method to a novel setup, where the model does not have access to labeled IMU data during training and is able to perform HAR with only IMU data during testing. Extensive experiments on a wide range of RGB-IMU datasets demonstrate that FACT significantly outperforms existing methods in zero-shot cross-modal transfer.

7/25/2024