Federated Active Learning Framework for Efficient Annotation Strategy in Skin-lesion Classification

Read original: arXiv:2406.11310 - Published 6/18/2024 by Zhipeng Deng, Yuqiao Yang, Kenji Suzuki

🏷️

Overview

Federated Learning (FL) allows multiple institutions to collaboratively train models without sharing private data
Current FL research focuses on improving communication efficiency, privacy protection, and personalization, but assumes ideal data collection
In medical scenarios, data annotation is a critical problem for FL, requiring expertise and labor
Active Learning (AL) has shown promise in reducing data annotation needs for medical image analysis
This paper proposes a Federated Active Learning (FedAL) framework that combines FL and AL to decrease annotation needs while preserving privacy

Plain English Explanation

Federated Learning (FL) is a way for multiple organizations to work together to train machine learning models without sharing the private data they each have. Current FL research has focused on making this process more efficient, protecting privacy, and personalizing the models. But this research has assumed that the data needed to train the models has already been collected and annotated (labeled) in the right way.

In medical scenarios, annotating data (like labeling medical images) can be a big challenge. It requires specialized expertise and a lot of manual work. This is a critical problem for using FL in healthcare. Active Learning (AL) is a technique that has shown promise for reducing the amount of data that needs to be annotated, especially for medical image analysis.

This paper proposes a new "Federated Active Learning" (FedAL) framework that combines FL and AL. In this framework, AL is used periodically and interactively as part of the FL process. The key idea is to use an ensemble of local models from each organization and a global model from the FL process to identify the most informative data samples that need to be annotated. This can decrease the overall annotation effort required while still preserving patient privacy.

The researchers tested this FedAL framework on real-world skin lesion datasets and found that it could achieve state-of-the-art performance using only 50% of the full dataset. It also outperformed other AL methods used with FL.

Technical Explanation

The paper proposes a Federated Active Learning (FedAL) framework that combines Federated Learning (FL) and Active Learning (AL) to address the challenge of data annotation in medical scenarios.

In the FedAL framework, the AL process is executed periodically and interactively as part of the FL training. Specifically, the researchers exploit a local model in each participating hospital and a global model acquired from the FL process to construct an ensemble. They then use an ensemble-entropy-based AL strategy to identify the most informative data samples that need to be annotated.

This approach has several advantages:

It can decrease the overall amount of data annotation required compared to standard FL, while maintaining model performance.
It preserves patient privacy by avoiding the need to share raw medical data between institutions.
The ensemble of local and global models provides a more robust and informative basis for the AL sample selection.

The researchers validated the FedAL framework on real-world dermoscopic datasets for skin lesion classification. They found that using only 50% of the full dataset, their FedAL approach could achieve state-of-the-art performance, outperforming several other AL methods used in an FL setting. The FedAL framework also achieved comparable performance to standard FL trained on the full dataset.

Critical Analysis

The key strength of this FedAL framework is its ability to significantly reduce the data annotation burden in medical scenarios while preserving privacy and maintaining model performance. The use of an ensemble of local and global models for the AL sample selection is a clever approach that leverages the complementary strengths of both the individual and federated models.

However, the paper does not deeply discuss potential limitations or caveats of the FedAL approach. For example, it would be useful to understand how the framework performs under different data distributions or levels of data heterogeneity across the participating institutions. Federated learning across decentralized, unshared archives has been identified as a key challenge for FL, and the impact of these factors on FedAL should be explored.

Additionally, while the paper demonstrates strong empirical results, it would be valuable to have a more thorough theoretical analysis of the properties and convergence guarantees of the FedAL framework. Federated Attention Consistent Learning has shown the importance of such theoretical underpinnings for FL approaches.

Overall, the FedAL framework represents an exciting advance in the application of FL to medical image analysis. Further research to address the noted limitations and validate the approach on a broader range of medical datasets would help solidify its potential impact.

Conclusion

This paper proposes a novel Federated Active Learning (FedAL) framework that combines Federated Learning and Active Learning to address the challenge of data annotation in medical scenarios. By leveraging an ensemble of local and global models to guide the active learning process, FedAL can substantially reduce the amount of annotated data required while preserving patient privacy and maintaining model performance.

The empirical results on real-world skin lesion datasets are promising, demonstrating the potential of FedAL to advance the state-of-the-art in federated learning for medical image analysis. Further research to explore the theoretical properties and limitations of the framework, as well as its applicability to a wider range of medical domains, could unlock even greater opportunities to improve healthcare outcomes through collaborative, privacy-preserving machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏷️

Federated Active Learning Framework for Efficient Annotation Strategy in Skin-lesion Classification

Zhipeng Deng, Yuqiao Yang, Kenji Suzuki

Federated Learning (FL) enables multiple institutes to train models collaboratively without sharing private data. Current FL research focuses on communication efficiency, privacy protection, and personalization and assumes that the data of FL have already been ideally collected. In medical scenarios, however, data annotation demands both expertise and intensive labor, which is a critical problem in FL. Active learning (AL), has shown promising performance in reducing the number of data annotations in medical image analysis. We propose a federated AL (FedAL) framework in which AL is executed periodically and interactively under FL. We exploit a local model in each hospital and a global model acquired from FL to construct an ensemble. We use ensemble-entropy-based AL as an efficient data-annotation strategy in FL. Therefore, our FedAL framework can decrease the amount of annotated data and preserve patient privacy while maintaining the performance of FL. To our knowledge, this is the first FedAL framework applied to medical images. We validated our framework on real-world dermoscopic datasets. Using only 50% of samples, our framework was able to achieve state-of-the-art performance on a skin-lesion classification task. Our framework performed better than several state-of-the-art AL methods under FL and achieved comparable performance to full-data FL.

6/18/2024

🖼️

Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts

Jiayi Chen, Benteng Ma, Hengfei Cui, Yong Xia

Federated learning facilitates the collaborative learning of a global model across multiple distributed medical institutions without centralizing data. Nevertheless, the expensive cost of annotation on local clients remains an obstacle to effectively utilizing local data. To mitigate this issue, federated active learning methods suggest leveraging local and global model predictions to select a relatively small amount of informative local data for annotation. However, existing methods mainly focus on all local data sampled from the same domain, making them unreliable in realistic medical scenarios with domain shifts among different clients. In this paper, we make the first attempt to assess the informativeness of local data derived from diverse domains and propose a novel methodology termed Federated Evidential Active Learning (FEAL) to calibrate the data evaluation under domain shift. Specifically, we introduce a Dirichlet prior distribution in both local and global models to treat the prediction as a distribution over the probability simplex and capture both aleatoric and epistemic uncertainties by using the Dirichlet-based evidential model. Then we employ the epistemic uncertainty to calibrate the aleatoric uncertainty. Afterward, we design a diversity relaxation strategy to reduce data redundancy and maintain data diversity. Extensive experiments and analysis on five real multi-center medical image datasets demonstrate the superiority of FEAL over the state-of-the-art active learning methods in federated scenarios with domain shifts. The code will be available at https://github.com/JiayiChen815/FEAL.

4/23/2024

A Comprehensive View of Personalized Federated Learning on Heterogeneous Clinical Datasets

Fatemeh Tavakoli, D. B. Emerson, Sana Ayromlou, John Jewell, Amrit Krishnan, Yuchong Zhang, Amol Verma, Fahad Razak

Federated learning (FL) is increasingly being recognized as a key approach to overcoming the data silos that so frequently obstruct the training and deployment of machine-learning models in clinical settings. This work contributes to a growing body of FL research specifically focused on clinical applications along three important directions. First, we expand the FLamby benchmark (du Terrail et al., 2022a) to include a comprehensive evaluation of personalized FL methods and demonstrate substantive performance improvements over the original results. Next, we advocate for a comprehensive checkpointing and evaluation framework for FL to reflect practical settings and provide multiple comparison baselines. To this end, an open-source library aimed at making FL experimentation simpler and more reproducible is released. Finally, we propose an important ablation of PerFCL (Zhang et al., 2022). This ablation results in a natural extension of FENDA (Kim et al., 2016) to the FL setting. Experiments conducted on the FLamby benchmark and GEMINI datasets (Verma et al., 2017) show that the proposed approach is robust to heterogeneous clinical data and often outperforms existing global and personalized FL techniques, including PerFCL.

7/8/2024

A Distributed Privacy Preserving Model for the Detection of Alzheimer's Disease

Paul K. Mandal

In the era of rapidly advancing medical technologies, the segmentation of medical data has become inevitable, necessitating the development of privacy preserving machine learning algorithms that can train on distributed data. Consolidating sensitive medical data is not always an option particularly due to the stringent privacy regulations imposed by the Health Insurance Portability and Accountability Act (HIPAA). In this paper, I introduce a HIPAA compliant framework that can train from distributed data. I then propose a multimodal vertical federated model for Alzheimer's Disease (AD) detection, a serious neurodegenerative condition that can cause dementia, severely impairing brain function and hindering simple tasks, especially without preventative care. This vertical federated learning (VFL) model offers a distributed architecture that enables collaborative learning across diverse sources of medical data while respecting privacy constraints imposed by HIPAA. The VFL architecture proposed herein offers a novel distributed architecture, enabling collaborative learning across diverse sources of medical data while respecting statutory privacy constraints. By leveraging multiple modalities of data, the robustness and accuracy of AD detection can be enhanced. This model not only contributes to the advancement of federated learning techniques but also holds promise for overcoming the hurdles posed by data segmentation in medical research.

9/30/2024