pFedAFM: Adaptive Feature Mixture for Batch-Level Personalization in Heterogeneous Federated Learning

2404.17847

Published 4/30/2024 by Liping Yi, Han Yu, Chao Ren, Heng Zhang, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

✨

Abstract

Model-heterogeneous personalized federated learning (MHPFL) enables FL clients to train structurally different personalized models on non-independent and identically distributed (non-IID) local data. Existing MHPFL methods focus on achieving client-level personalization, but cannot address batch-level data heterogeneity. To bridge this important gap, we propose a model-heterogeneous personalized Federated learning approach with Adaptive Feature Mixture (pFedAFM) for supervised learning tasks. It consists of three novel designs: 1) A sharing global homogeneous small feature extractor is assigned alongside each client's local heterogeneous model (consisting of a heterogeneous feature extractor and a prediction header) to facilitate cross-client knowledge fusion. The two feature extractors share the local heterogeneous model's prediction header containing rich personalized prediction knowledge to retain personalized prediction capabilities. 2) An iterative training strategy is designed to alternately train the global homogeneous small feature extractor and the local heterogeneous large model for effective global-local knowledge exchange. 3) A trainable weight vector is designed to dynamically mix the features extracted by both feature extractors to adapt to batch-level data heterogeneity. Theoretical analysis proves that pFedAFM can converge over time. Extensive experiments on 2 benchmark datasets demonstrate that it significantly outperforms 7 state-of-the-art MHPFL methods, achieving up to 7.93% accuracy improvement while incurring low communication and computation costs.

Create account to get full access

Overview

Proposes a novel personalized federated learning (FL) approach called model-heterogeneous personalized federated learning (MHPFL) that enables clients to train structurally different personalized models on non-independent and identically distributed (non-IID) local data.
Existing MHPFL methods focus on client-level personalization but cannot address batch-level data heterogeneity.
Introduces a model-heterogeneous personalized Federated learning approach with Adaptive Feature Mixture (pFedAFM) to bridge this gap for supervised learning tasks.

Plain English Explanation

Federated learning (FL) is a technique where multiple devices or clients collaborate to train a shared machine learning model, without sharing their raw data. This is useful when data is sensitive or distributed across many locations. However, a challenge in FL is that the data on each client may be very different (non-IID), making it difficult to train a single model that works well for everyone.

pFedAFM addresses this by allowing each client to train their own personalized model, rather than a single shared model. The key innovation is that it uses a combination of a shared "global" feature extractor and a client-specific "local" feature extractor. This allows the model to learn both general patterns (from the global extractor) and personalized patterns (from the local extractor).

Additionally, pFedAFM uses an "adaptive feature mixer" that dynamically combines the features from the global and local extractors. This helps the model adapt to differences in the data distributions across batches, further improving performance on non-IID data.

Technical Explanation

The pFedAFM approach consists of three main components:

Shared Global and Local Feature Extractors: Each client has a local heterogeneous model (consisting of a heterogeneous feature extractor and a prediction header) as well as a shared global homogeneous small feature extractor. The two feature extractors share the local model's prediction header to facilitate cross-client knowledge fusion while retaining personalized prediction capabilities.
Iterative Training Strategy: An iterative training strategy is designed to alternately train the global homogeneous small feature extractor and the local heterogeneous large model. This enables effective global-local knowledge exchange.
Adaptive Feature Mixer: A trainable weight vector is used to dynamically mix the features extracted by both the global and local feature extractors. This helps the model adapt to batch-level data heterogeneity.

Theoretical analysis shows that pFedAFM can converge over time. Experiments on benchmark datasets demonstrate that pFedAFM significantly outperforms 7 state-of-the-art MHPFL methods, achieving up to 7.93% accuracy improvement while incurring low communication and computation costs.

Critical Analysis

The paper presents a compelling solution to the challenge of training personalized models in federated learning settings with non-IID data. By combining global and local feature extractors, and using an adaptive feature mixer, pFedAFM is able to capture both general and personalized patterns in the data.

One potential limitation is that the approach may still struggle with highly diverse or skewed data distributions across clients. The authors mention that pFedAFM cannot fully address client-level data heterogeneity, and further research may be needed to address this.

Additionally, the paper does not explore the scalability of pFedAFM to larger numbers of clients or more complex model architectures. As the number of clients grows, the overhead of maintaining and coordinating the global and local feature extractors may become a bottleneck.

Overall, pFedAFM represents a significant advance in personalized federated learning, as demonstrated by FedSSA, FedP3, and FedMES. The authors have made a thoughtful contribution to the field, and their work serves as a useful foundation for further research in adaptive clustered federated learning and personalized FL.

Conclusion

The proposed pFedAFM approach advances the state-of-the-art in personalized federated learning by enabling clients to train structurally different personalized models on non-IID local data. Its key innovations, including the shared global and local feature extractors and the adaptive feature mixer, allow it to effectively capture both general and personalized patterns in the data, leading to significant performance improvements over existing MHPFL methods.

While pFedAFM has some limitations in fully addressing client-level data heterogeneity, it represents an important step forward in the field of federated learning, with potential applications in a wide range of domains where data privacy and personalization are crucial.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Personalized federated learning based on feature fusion

Wolong Xing, Zhenkui Shi, Hongyan Peng, Xiantao Hu, Xianxian Li

Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. However, due to the heterogeneity of data, models, and devices, the final global model may need to perform better for tasks on each client. Communication bottlenecks, data heterogeneity, and model heterogeneity have been common challenges in federated learning. In this work, we considered a label distribution skew problem, a type of data heterogeneity easily overlooked. In the context of classification, we propose a personalized federated learning approach called pFedPM. In our process, we replace traditional gradient uploading with feature uploading, which helps reduce communication costs and allows for heterogeneous client models. These feature representations play a role in preserving privacy to some extent. We use a hyperparameter $a$ to mix local and global features, which enables us to control the degree of personalization. We also introduced a relation network as an additional decision layer, which provides a non-linear learnable classifier to predict labels. Experimental results show that, with an appropriate setting of $a$, our scheme outperforms several recent FL methods on MNIST, FEMNIST, and CRIFAR10 datasets and achieves fewer communications.

6/26/2024

cs.LG cs.CV

📊

Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

Rongyu Zhang, Yun Chen, Chenrui Wu, Fangxin Wang, Bo Li

Federated learning (FL) offers a privacy-centric distributed learning framework, enabling model training on individual clients and central aggregation without necessitating data exchange. Nonetheless, FL implementations often suffer from non-i.i.d. and long-tailed class distributions across mobile applications, e.g., autonomous vehicles, which leads models to overfitting as local training may converge to sub-optimal. In our study, we explore the impact of data heterogeneity on model bias and introduce an innovative personalized FL framework, Multi-level Personalized Federated Learning (MuPFL), which leverages the hierarchical architecture of FL to fully harness computational resources at various levels. This framework integrates three pivotal modules: Biased Activation Value Dropout (BAVD) to mitigate overfitting and accelerate training; Adaptive Cluster-based Model Update (ACMU) to refine local models ensuring coherent global aggregation; and Prior Knowledge-assisted Classifier Fine-tuning (PKCF) to bolster classification and personalize models in accord with skewed local data with shared knowledge. Extensive experiments on diverse real-world datasets for image classification and semantic segmentation validate that MuPFL consistently outperforms state-of-the-art baselines, even under extreme non-i.i.d. and long-tail conditions, which enhances accuracy by as much as 7.39% and accelerates training by up to 80% at most, marking significant advancements in both efficiency and effectiveness.

5/13/2024

cs.AI

📈

MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis

Luyuan Xie, Manqing Lin, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.

5/14/2024

cs.LG cs.AI

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Zheng Wang, Zheng Wang, Zhaopeng Peng, Zihui Wang, Cheng Wang

Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.

6/3/2024

cs.LG cs.AI cs.DC