Diffusion Model-based Contrastive Learning for Human Activity Recognition

Read original: arXiv:2408.05567 - Published 8/13/2024 by Chunjing Xiao, Yanhui Han, Wei Yang, Yane Hou, Fangzhan Shi, Kevin Chetty

Diffusion Model-based Contrastive Learning for Human Activity Recognition

Overview

Contrastive learning is a self-supervised approach for learning data representations
Diffusion probabilistic models are a class of generative models that can generate realistic samples
This paper proposes a diffusion model-based contrastive learning method for human activity recognition using WiFi channel state information (CSI) data

Plain English Explanation

Contrastive learning is a way of teaching machine learning models to understand data without needing labeled examples. The key idea is to have the model learn to identify similarities and differences between different data samples. This can be very useful for tasks like recognizing human activities from sensor data, where getting labeled examples is difficult.

This paper explores using a special type of generative model called a diffusion model to power a contrastive learning approach for human activity recognition. Diffusion models work by gradually adding random noise to data, then learning how to reverse that process to generate new, realistic-looking samples. The researchers hypothesized that the representations learned by a diffusion model could be useful for distinguishing between different human activities based on WiFi signal data.

Their experiments show that this diffusion model-based contrastive learning method can achieve strong performance on activity recognition tasks, outperforming other state-of-the-art approaches. The learned representations capture relevant information about the activities while being robust to noise in the sensor data.

Technical Explanation

The paper proposes a Diffusion Model-based Contrastive Learning (DMCL) approach for human activity recognition using WiFi CSI data. The key idea is to leverage diffusion probabilistic models, which are a class of powerful generative models, to learn useful data representations in a self-supervised manner.

The DMCL framework consists of two main components:

A diffusion model that is trained to generate realistic WiFi CSI samples by gradually adding and then reversing random noise.
A contrastive learning objective that encourages the diffusion model to learn representations that can effectively distinguish between different human activities.

The authors demonstrate the effectiveness of DMCL on several human activity recognition benchmarks, showing that it outperforms other state-of-the-art approaches. The learned representations capture relevant information about the activities while being robust to noise in the sensor data.

Critical Analysis

The paper provides a novel and promising approach for leveraging diffusion models in a contrastive learning framework for human activity recognition. The authors carefully design the training process and experimental setup to ensure a fair and thorough evaluation.

One potential limitation is that the paper does not delve deeply into the interpretability of the learned representations. Understanding what specific aspects of the WiFi CSI data the model is focusing on could provide valuable insights for domain experts.

Additionally, the paper focuses on a single modality (WiFi CSI) for activity recognition. Exploring the integration of DMCL with multimodal sensor data, such as accelerometers or cameras, could further improve the model's performance and robustness.

Conclusion

This paper presents a Diffusion Model-based Contrastive Learning (DMCL) approach for human activity recognition using WiFi CSI data. The proposed method leverages the powerful generative capabilities of diffusion models to learn robust and discriminative data representations in a self-supervised manner.

The experimental results demonstrate the effectiveness of DMCL, which outperforms other state-of-the-art methods on several benchmarks. This work highlights the potential of combining diffusion models and contrastive learning for sensor-based activity recognition, with potential applications in areas like smart homes, healthcare, and human-computer interaction.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Diffusion Model-based Contrastive Learning for Human Activity Recognition

Chunjing Xiao, Yanhui Han, Wei Yang, Yane Hou, Fangzhan Shi, Kevin Chetty

WiFi Channel State Information (CSI)-based activity recognition has sparked numerous studies due to its widespread availability and privacy protection. However, when applied in practical applications, general CSI-based recognition models may face challenges related to the limited generalization capability, since individuals with different behavior habits will cause various fluctuations in CSI data and it is difficult to gather enough training data to cover all kinds of motion habits. To tackle this problem, we design a diffusion model-based Contrastive Learning framework for human Activity Recognition (CLAR) using WiFi CSI. On the basis of the contrastive learning framework, we primarily introduce two components for CLAR to enhance CSI-based activity recognition. To generate diverse augmented data and complement limited training data, we propose a diffusion model-based time series-specific augmentation model. In contrast to typical diffusion models that directly apply conditions to the generative process, potentially resulting in distorted CSI data, our tailored model dissects these condition into the high-frequency and low-frequency components, and then applies these conditions to the generative process with varying weights. This can alleviate data distortion and yield high-quality augmented data. To efficiently capture the difference of the sample importance, we present an adaptive weight algorithm. Different from typical contrastive learning methods which equally consider all the training samples, this algorithm adaptively adjusts the weights of positive sample pairs for learning better data representations. The experiments suggest that CLAR achieves significant gains compared to state-of-the-art methods.

8/13/2024

🤷

Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition

Si Zuo, Vitor Fortes Rey, Sungho Suh, Stephan Sigg, Paul Lukowicz

Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques.

5/21/2024

CDFL: Efficient Federated Human Activity Recognition using Contrastive Learning and Deep Clustering

Ensieh Khazaei, Alireza Esmaeilzehi, Bilal Taha, Dimitrios Hatzinakos

In the realm of ubiquitous computing, Human Activity Recognition (HAR) is vital for the automation and intelligent identification of human actions through data from diverse sensors. However, traditional machine learning approaches by aggregating data on a central server and centralized processing are memory-intensive and raise privacy concerns. Federated Learning (FL) has emerged as a solution by training a global model collaboratively across multiple devices by exchanging their local model parameters instead of local data. However, in realistic settings, sensor data on devices is non-independently and identically distributed (Non-IID). This means that data activity recorded by most devices is sparse, and sensor data distribution for each client may be inconsistent. As a result, typical FL frameworks in heterogeneous environments suffer from slow convergence and poor performance due to deviation of the global model's objective from the global objective. Most FL methods applied to HAR are either designed for overly ideal scenarios without considering the Non-IID problem or present privacy and scalability concerns. This work addresses these challenges, proposing CDFL, an efficient federated learning framework for image-based HAR. CDFL efficiently selects a representative set of privacy-preserved images using contrastive learning and deep clustering, reduces communication overhead by selecting effective clients for global model updates, and improves global model quality by training on privacy-preserved data. Our comprehensive experiments carried out on three public datasets, namely Stanford40, PPMI, and VOC2012, demonstrate the superiority of CDFL in terms of performance, convergence rate, and bandwidth usage compared to state-of-the-art approaches.

7/18/2024

Consistency Based Weakly Self-Supervised Learning for Human Activity Recognition with Wearables

Taoran Sheng, Manfred Huber

While the widely available embedded sensors in smartphones and other wearable devices make it easier to obtain data of human activities, recognizing different types of human activities from sensor-based data remains a difficult research topic in ubiquitous computing. One reason for this is that most of the collected data is unlabeled. However, many current human activity recognition (HAR) systems are based on supervised methods, which heavily rely on the labels of the data. We describe a weakly self-supervised approach in this paper that consists of two stages: (1) In stage one, the model learns from the nature of human activities by projecting the data into an embedding space where similar activities are grouped together; (2) In stage two, the model is fine-tuned using similarity information in a few-shot learning fashion using the similarity information of the data. This allows downstream classification or clustering tasks to benefit from the embeddings. Experiments on three benchmark datasets demonstrate the framework's effectiveness and show that our approach can help the clustering algorithm achieve comparable performance in identifying and categorizing the underlying human activities as pure supervised techniques applied directly to a corresponding fully labeled data set.

8/15/2024