SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning

Read original: arXiv:2405.00712 - Published 5/7/2024 by Duc-Anh Nguyen, Nhien-An Le-Khac

SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning

Overview

Comprehensive review of deep learning techniques for complex human activity recognition
Covers various sensor modalities including wearables, cameras, and inertial measurement units (IMUs)
Examines key factors influencing the accuracy of complex activity recognition

Plain English Explanation

This paper provides a thorough overview of how deep learning techniques are used to recognize complex human activities from sensor data. Complex activities refer to those that involve multiple steps or are more nuanced than simple actions like walking or running.

The researchers examine different types of sensors that can be used for this task, including wearable devices like smartwatches or fitness trackers, as well as cameras and inertial measurement units (IMUs) that can track movement. They explore how the choice of sensor, the way the data is processed, and the deep learning model architecture can all impact the accuracy of recognizing complex activities.

For example, wearable sensors can provide detailed information about body movements, but may struggle with activities that involve interactions with the environment. Camera-based systems can capture those interactions, but may have trouble with occlusions or privacy concerns. The researchers examine how combining different sensor modalities can help overcome the limitations of any single approach.

The paper also discusses challenges like federated learning to train models without centralizing user data, as well as the potential for self-supervised learning techniques to improve performance on complex activities with limited labeled data.

Overall, this review provides a comprehensive look at the state-of-the-art in deep learning for complex human activity recognition, highlighting key factors that influence the accuracy and real-world applicability of these systems.

Technical Explanation

The paper begins by discussing the importance of human activity recognition (HAR) for applications like healthcare monitoring, smart homes, and security. However, the authors note that most existing research has focused on relatively simple activities, while real-world scenarios often involve more complex, multi-step actions.

To address this gap, the researchers conduct a systematic review of deep learning techniques for complex HAR. They categorize the key sensor modalities used, including wearable devices, cameras, and inertial measurement units (IMUs). The paper examines how the choice of sensors, data preprocessing, and model architecture can impact the accuracy of complex activity recognition.

For example, the authors discuss how wearable sensors can provide rich motion data, but may struggle with activities involving environmental interactions. Camera-based systems can capture these interactions, but face challenges like occlusions and privacy concerns. The researchers explore how multimodal approaches that combine different sensor types can help overcome the limitations of any single modality.

The review also covers technical considerations such as federated learning to preserve user privacy, and the potential of self-supervised learning techniques to improve performance on complex activities with limited labeled data.

Throughout the paper, the authors highlight key insights and best practices for designing accurate and robust deep learning models for complex HAR applications.

Critical Analysis

The paper provides a thorough and well-researched review of the state-of-the-art in deep learning for complex human activity recognition. The authors do an excellent job of identifying the key challenges and limitations of existing approaches, and exploring promising directions for future research.

One area that could have been explored in more depth is the potential ethical and privacy concerns around the use of sensor-based activity recognition, particularly in scenarios like healthcare monitoring or security applications. The paper mentions federated learning as a way to preserve user privacy, but a more in-depth discussion of these issues and potential mitigation strategies would have been valuable.

Additionally, while the paper covers a wide range of sensor modalities, the discussion of multimodal approaches could have been expanded further. The authors note the potential benefits of combining different sensor types, but more details on practical implementation and the tradeoffs involved would have been helpful.

Overall, this paper serves as a comprehensive and insightful reference for researchers and practitioners working in the field of complex human activity recognition. The authors have done an excellent job of synthesizing the latest developments and highlighting the key challenges and opportunities in this rapidly evolving area of study.

Conclusion

This review paper provides a thorough analysis of the state-of-the-art in deep learning techniques for complex human activity recognition. The researchers examine the various sensor modalities, data processing methods, and model architectures that can be used to tackle this challenging problem.

The paper highlights the strengths and limitations of different sensor types, such as wearables, cameras, and inertial measurement units, and explores how multimodal approaches can help overcome the shortcomings of any single modality. It also discusses important technical considerations, such as the use of federated learning to preserve user privacy and the potential of self-supervised learning to improve performance on complex activities with limited labeled data.

Overall, this review offers a comprehensive understanding of the key factors that influence the accuracy and real-world applicability of deep learning-based complex human activity recognition systems. The insights provided can serve as a valuable resource for researchers and practitioners working to advance the state-of-the-art in this important field of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning

Duc-Anh Nguyen, Nhien-An Le-Khac

Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.

5/7/2024

A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities

Jungpil Shin, Najmul Hassan, Abu Saleh Musa Miah1, Satoshi Nishimura

Human Activity Recognition (HAR) systems aim to understand human behaviour and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2024, focusing on machine learning (ML) and deep learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human-object interactions, and activity detection. Our survey includes a detailed dataset description for each modality and a summary of the latest HAR systems, offering comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR.

9/17/2024

👁️

Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

Xi Chen (M-PSI), Julien Cumin (M-PSI), Fano Ramparany (M-PSI), Dominique Vaufreydaz (M-PSI)

Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.

7/16/2024

A Critical Analysis on Machine Learning Techniques for Video-based Human Activity Recognition of Surveillance Systems: A Review

Shahriar Jahan, Roknuzzaman, Md Robiul Islam

Upsurging abnormal activities in crowded locations such as airports, train stations, bus stops, shopping malls, etc., urges the necessity for an intelligent surveillance system. An intelligent surveillance system can differentiate between normal and suspicious activities from real-time video analysis that will enable to take appropriate measures regarding the level of an anomaly instantaneously and efficiently. Video-based human activity recognition has intrigued many researchers with its pressing issues and a variety of applications ranging from simple hand gesture recognition to crucial behavior recognition in a surveillance system. This paper provides a critical survey of video-based Human Activity Recognition (HAR) techniques beginning with an examination of basic approaches for detecting and recognizing suspicious behavior followed by a critical analysis of machine learning and deep learning techniques such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Hidden Markov Model (HMM), K-means Clustering etc. A detailed investigation and comparison are done on these learning techniques on the basis of feature extraction techniques, parameter initialization, and optimization algorithms, accuracy, etc. The purpose of this review is to prioritize positive schemes and to assist researchers with emerging advancements in this field's future endeavors. This paper also pragmatically discusses existing challenges in the field of HAR and examines the prospects in the field.

9/4/2024