FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

Read original: arXiv:2406.18569 - Published 6/28/2024 by Qi Qiu, Tao Zhu, Furong Duan, Kevin I-Kai Wang, Liming Chen, Mingxing Nie, Mingxing Nie
Total Score

0

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

• This paper, titled "FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs," proposes a novel approach to improve the performance of cross-user human activity recognition (HAR) using inertial measurement units (IMUs).

Plain English Explanation

• The researchers developed a system called FLOW that combines global and local views of sensor data to better recognize human activities, even when the system is used by different people.

• Typically, HAR systems are trained on data from a specific set of users and struggle to perform well when applied to new users. FLOW addresses this by fusing information from both a global view (looking at the overall patterns in the sensor data) and a local view (focusing on the individual nuances of each user's movements).

• By shuffling and combining these global and local representations, FLOW can learn more robust and generalizable features for cross-user HAR. This allows the system to achieve higher accuracy when recognizing activities performed by users it hasn't seen before.

• The researchers demonstrate the effectiveness of FLOW on several public datasets, showing it outperforms other state-of-the-art HAR approaches in cross-user settings.

Technical Explanation

• FLOW uses a deep learning architecture that consists of two parallel branches: one for extracting global features and one for extracting local features from the IMU sensor data.

• The global branch aggregates information across all users to capture the overall patterns of different activities. The local branch focuses on learning the unique characteristics of each individual user's movements.

• The global and local feature representations are then fused and shuffled, allowing the model to learn a more generalizable set of features that can effectively recognize activities performed by new users.

• FLOW also incorporates a novel "shuffling" mechanism that randomly mixes the global and local features during training, forcing the model to learn representations that are robust to variations in user-specific movements.

• The researchers evaluate FLOW on several public HAR datasets, including Enhancing Inertial Hand-based HAR through Joint Optimization of Sensor-Fusion and Transformer, Unsupervised Statistical Feature Guided Diffusion Model for Sensor-Based Human Activity Recognition, and Sensor Data Augmentation from Skeleton Pose Sequences for Human Activity Recognition. FLOW demonstrates superior cross-user HAR performance compared to other state-of-the-art methods.

Critical Analysis

• The paper acknowledges that FLOW's performance may be limited by the quality and diversity of the training data used. Incorporating more diverse user data could further improve the model's cross-user generalization capabilities.

• The authors also note that the shuffling mechanism, while effective, may not be the only way to learn robust global and local feature representations. Exploring alternative techniques for feature fusion and regularization could lead to further improvements.

• Additionally, the paper does not provide a detailed analysis of the computational complexity and inference time of FLOW, which could be relevant for real-world deployments of the system.

Conclusion

• The FLOW system represents a promising approach to improving cross-user HAR by leveraging the complementary strengths of global and local feature representations. The proposed fusion and shuffling mechanisms enable the model to learn more generalizable features, leading to better performance on unseen users.

• This research contributes to the ongoing efforts to develop robust and adaptable HAR systems that can be widely deployed in various applications, such as MUJO: Multimodal Joint Feature Space Learning for Human Activity Recognition and IMUsE: IMU-based Facial Expression Capture.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs
Total Score

0

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

Qi Qiu, Tao Zhu, Furong Duan, Kevin I-Kai Wang, Liming Chen, Mingxing Nie, Mingxing Nie

Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.

Read more

6/28/2024

Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition
Total Score

0

Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition

Abhi Kamboj, Anh Duy Nguyen, Minh Do

Despite living in a multi-sensory world, most AI models are limited to textual and visual interpretations of human motion and behavior. Inertial measurement units (IMUs) provide a salient signal to understand human motion; however, they are challenging to use due to their uninterpretability and scarcity of their data. We investigate a method to transfer knowledge between visual and inertial modalities using the structure of an informative joint representation space designed for human action recognition (HAR). We apply the resulting Fusion and Cross-modal Transfer (FACT) method to a novel setup, where the model does not have access to labeled IMU data during training and is able to perform HAR with only IMU data during testing. Extensive experiments on a wide range of RGB-IMU datasets demonstrate that FACT significantly outperforms existing methods in zero-shot cross-modal transfer.

Read more

7/25/2024

Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs
Total Score

0

Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs

Vitor Fortes Rey, Lala Shakti Swarup Ray, Xia Qingxin, Kaishun Wu, Paul Lukowicz

Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose Multi$^3$Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.Our experimental findings validate the effectiveness of our approach in improving HAR performance with IMU data. We demonstrate that models trained with synthetic IMU data generated from videos using our method surpass existing approaches in recognizing fine-grained activities.

Read more

7/30/2024

🤷

Total Score

0

Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition

Si Zuo, Vitor Fortes Rey, Sungho Suh, Stephan Sigg, Paul Lukowicz

Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques.

Read more

5/21/2024