FMLFS: A federated multi-label feature selection based on information theory in IoT environment

Read original: arXiv:2405.00524 - Published 5/2/2024 by Afsaneh Mahanipour, Hana Khamfroush

FMLFS: A federated multi-label feature selection based on information theory in IoT environment

Overview

This paper presents a novel federated multi-label feature selection (FMLFS) method based on information theory for Internet of Things (IoT) environments.
The method aims to address the challenges of feature selection in multi-label data and the constraints of federated learning in IoT settings.
It utilizes a bi-objective optimization approach to balance the trade-off between relevance and redundancy of features, leveraging the Pareto dominance concept and crowding distance to select the most informative features.

Plain English Explanation

In modern IoT (Internet of Things) environments, devices often collect large amounts of data with multiple target variables or "labels" that need to be predicted. Selecting the most relevant features from this data is a critical step for building accurate machine learning models. However, this feature selection process is complicated by the federated nature of IoT data, where data is distributed across many devices and can't be easily combined.

The FMLFS method proposed in this paper tackles this challenge. It uses an information-theoretic approach to evaluate the usefulness of different features, considering both their relevance to the target labels and how much redundant information they provide. The method employs a bi-objective optimization technique to find the best balance between these two factors, leveraging the concept of Pareto dominance and crowding distance.

This allows FMLFS to identify the most informative subset of features in a federated setting, where data from multiple IoT devices cannot be centralized. By preserving privacy and reducing communication overhead, this federated approach can enable more efficient and effective machine learning on IoT data compared to traditional centralized methods.

Technical Explanation

The FMLFS method begins by computing the mutual information between each feature and the multi-label target variables. This quantifies the relevance of each feature. To minimize redundancy, the method also computes the mutual information between pairs of features.

These relevance and redundancy measures are then used in a bi-objective optimization problem, where the goal is to simultaneously maximize the relevance and minimize the redundancy of the selected features. The Pareto dominance concept is used to identify the set of non-dominated solutions, representing the optimal trade-offs between these two objectives.

To select the final feature subset, the authors employ the crowding distance metric to diversify the Pareto-optimal solutions and choose the ones that are well-spread out. This helps ensure the selected features provide complementary information.

The proposed FMLFS method is evaluated on several multi-label datasets in an IoT simulation environment. It is compared to other feature selection techniques, demonstrating improved classification performance while maintaining the privacy and communication efficiency advantages of federated learning.

Critical Analysis

The FMLFS method provides a principled approach to feature selection in federated multi-label learning scenarios, which is an important problem in the IoT domain. The use of information-theoretic measures and bi-objective optimization is well-justified and the results show promising performance improvements.

However, the paper could have provided more details on the specific IoT setting and constraints assumed, as well as the feasibility of implementing the method in real-world IoT deployments. The authors mention communication efficiency, but do not quantify the actual communication overhead or latency implications of the federated approach.

Additionally, the paper does not explore the robustness of FMLFS to different types of data distributions or label correlations, which can significantly impact feature selection. Further research could investigate the method's performance in more heterogeneous federated learning scenarios.

Conclusion

The FMLFS method presented in this paper offers an innovative approach to feature selection for multi-label learning in IoT environments. By leveraging information theory and bi-objective optimization, it can identify the most relevant and non-redundant features while preserving the privacy and communication efficiency benefits of federated learning.

This work advances the state-of-the-art in federated machine learning for IoT applications, where effective feature selection is crucial for building accurate and efficient models. The authors have demonstrated the potential of FMLFS, and future research could further explore its practical applicability and robustness in diverse federated learning scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FMLFS: A federated multi-label feature selection based on information theory in IoT environment

Afsaneh Mahanipour, Hana Khamfroush

In certain emerging applications such as health monitoring wearable and traffic monitoring systems, Internet-of-Things (IoT) devices generate or collect a huge amount of multi-label datasets. Within these datasets, each instance is linked to a set of labels. The presence of noisy, redundant, or irrelevant features in these datasets, along with the curse of dimensionality, poses challenges for multi-label classifiers. Feature selection (FS) proves to be an effective strategy in enhancing classifier performance and addressing these challenges. Yet, there is currently no existing distributed multi-label FS method documented in the literature that is suitable for distributed multi-label datasets within IoT environments. This paper introduces FMLFS, the first federated multi-label feature selection method. Here, mutual information between features and labels serves as the relevancy metric, while the correlation distance between features, derived from mutual information and joint entropy, is utilized as the redundancy measure. Following aggregation of these metrics on the edge server and employing Pareto-based bi-objective and crowding distance strategies, the sorted features are subsequently sent back to the IoT devices. The proposed method is evaluated through two scenarios: 1) transmitting reduced-size datasets to the edge server for centralized classifier usage, and 2) employing federated learning with reduced-size datasets. Evaluation across three metrics - performance, time complexity, and communication cost - demonstrates that FMLFS outperforms five other comparable methods in the literature and provides a good trade-off on three real-world datasets.

5/2/2024

Causal Multi-Label Feature Selection in Federated Setting

Yukun Song, Dayuan Cao, Jiali Miao, Shuai Yang, Kui Yu

Multi-label feature selection serves as an effective mean for dealing with high-dimensional multi-label data. To achieve satisfactory performance, existing methods for multi-label feature selection often require the centralization of substantial data from multiple sources. However, in Federated setting, centralizing data from all sources and merging them into a single dataset is not feasible. To tackle this issue, in this paper, we study a challenging problem of causal multi-label feature selection in federated setting and propose a Federated Causal Multi-label Feature Selection (FedCMFS) algorithm with three novel subroutines. Specifically, FedCMFS first uses the FedCFL subroutine that considers the correlations among label-label, label-feature, and feature-feature to learn the relevant features (candidate parents and children) of each class label while preserving data privacy without centralizing data. Second, FedCMFS employs the FedCFR subroutine to selectively recover the missed true relevant features. Finally, FedCMFS utilizes the FedCFC subroutine to remove false relevant features. The extensive experiments on 8 datasets have shown that FedCMFS is effect for causal multi-label feature selection in federated setting.

8/28/2024

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li

Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods.

6/19/2024

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication

Liangqi Yuan, Dong-Jun Han, Vishnu Pandi Chellapandi, Stanislaw H. .Zak, Christopher G. Brinton

Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.

8/21/2024