Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

Read original: arXiv:2407.09529 - Published 7/16/2024 by Xi Chen (M-PSI), Julien Cumin (M-PSI), Fano Ramparany (M-PSI), Dominique Vaufreydaz (M-PSI)

👁️

Overview

Human Activity Recognition (HAR) is a critical problem in healthcare, elderly care, and home security
Traditional HAR approaches face challenges like data scarcity, model generalization, and recognizing activities in multi-person scenarios
This paper proposes a system called LAHAR that uses large language models and prompt engineering to address HAR in multi-person settings

Plain English Explanation

Human activity recognition (HAR) is an important technology used in a variety of applications, such as healthcare, elderly care, and home security. It involves using sensors and machine learning to detect and classify the actions and behaviors of people.

However, traditional HAR approaches have faced some challenges. For example, they may struggle with limited training data, difficulties in making their models work well across different settings, and the complexity of recognizing activities when there are multiple people involved.

To address these challenges, the researchers in this paper developed a new system called LAHAR. LAHAR uses large language models and a technique called "prompt engineering" to enable HAR in multi-person scenarios. The key idea is that the language model can be prompted to first separate the different people in a scene, and then describe the actions and events related to each person.

The researchers tested LAHAR on a dataset called ARAS and found that it performed comparably to state-of-the-art HAR methods, while maintaining robustness in multi-person settings.

Technical Explanation

The paper proposes a system called LAHAR (Language-Augmented Human Activity Recognition) that leverages large language models and prompt engineering techniques to address the challenges of HAR, particularly in multi-person scenarios.

The LAHAR framework consists of three main components:

Subject Separation: The language model is prompted to identify and separate the different people present in an input scene.
Action-Level Description: The language model is then prompted to describe the actions and events related to each individual subject.
Activity Recognition: The textual descriptions generated by the language model are used as input to a downstream activity recognition model.

The researchers evaluated LAHAR on the ARAS dataset, which contains videos of multi-person activities. The results showed that LAHAR achieves comparable accuracy to state-of-the-art HAR methods at higher resolutions, while maintaining robustness in multi-person scenarios.

Critical Analysis

The paper presents a promising approach to addressing some of the key challenges in HAR, particularly when it comes to recognizing activities in complex, multi-person environments. The use of large language models and prompt engineering is an interesting and potentially powerful technique that could help overcome issues like data scarcity and model generalization.

However, the paper does not provide much discussion on the potential limitations or caveats of the LAHAR system. For example, it's unclear how the language model's performance might scale to larger, more diverse datasets, or how sensitive the system is to the quality and consistency of the prompts used.

Additionally, while the ARAS dataset is a useful testbed, it may not fully capture the range of real-world HAR scenarios that the system would need to handle. Further evaluation on a broader set of datasets and use cases would be helpful to better understand the strengths and weaknesses of the LAHAR approach.

Conclusion

This paper introduces a novel framework called LAHAR that leverages large language models and prompt engineering to address the challenges of human activity recognition, particularly in multi-person scenarios. The results demonstrate that LAHAR can achieve comparable accuracy to state-of-the-art methods while maintaining robustness in complex situations.

The LAHAR approach represents an interesting and potentially impactful direction for advancing the field of HAR, which has important applications in healthcare, elderly care, and home security. Further research and development of this technology could lead to more capable and flexible activity recognition systems that can better handle the nuances and complexities of real-world human behavior.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

Xi Chen (M-PSI), Julien Cumin (M-PSI), Fano Ramparany (M-PSI), Dominique Vaufreydaz (M-PSI)

Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.

7/16/2024

A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities

Jungpil Shin, Najmul Hassan, Abu Saleh Musa Miah1, Satoshi Nishimura

Human Activity Recognition (HAR) systems aim to understand human behaviour and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2024, focusing on machine learning (ML) and deep learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human-object interactions, and activity detection. Our survey includes a detailed dataset description for each modality and a summary of the latest HAR systems, offering comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR.

9/17/2024

ALS-HAR: Harnessing Wearable Ambient Light Sensors to Enhance IMU-based HAR

Lala Shakti Swarup Ray, Daniel Gei{ss}ler, Mengxi Liu, Bo Zhou, Sungho Suh, Paul Lukowicz

Despite the widespread integration of ambient light sensors (ALS) in smart devices commonly used for screen brightness adaptation, their application in human activity recognition (HAR), primarily through body-worn ALS, is largely unexplored. In this work, we developed ALS-HAR, a robust wearable light-based motion activity classifier. Although ALS-HAR achieves comparable accuracy to other modalities, its natural sensitivity to external disturbances, such as changes in ambient light, weather conditions, or indoor lighting, makes it challenging for daily use. To address such drawbacks, we introduce strategies to enhance environment-invariant IMU-based activity classifications through augmented multi-modal and contrastive classifications by transferring the knowledge extracted from the ALS. Our experiments on a real-world activity dataset for three different scenarios demonstrate that while ALS-HAR's accuracy strongly relies on external lighting conditions, cross-modal information can still improve other HAR systems, such as IMU-based classifiers.Even in scenarios where ALS performs insufficiently, the additional knowledge enables improved accuracy and macro F1 score by up to 4.2 % and 6.4 %, respectively, for IMU-based classifiers and even surpasses multi-modal sensor fusion models in two of our three experiment scenarios. Our research highlights the untapped potential of ALS integration in advancing sensor-based HAR technology, paving the way for practical and efficient wearable ALS-based activity recognition systems with potential applications in healthcare, sports monitoring, and smart indoor environments.

8/23/2024

SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning

Duc-Anh Nguyen, Nhien-An Le-Khac

Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.

5/7/2024