Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation

Read original: arXiv:2409.03087 - Published 9/6/2024 by Amir Syahmi, Xiangrong Lu, Yinxuan Li, Haoxuan Yao, Hanjun Jiang, Ishita Acharya, Shiyi Wang, Yang Nan, Xiaodan Xing, Guang Yang

Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation

Overview

Explores coupling AI and citizen science to create enhanced training datasets for medical image segmentation
Demonstrates how combining automated and crowdsourced approaches can improve the quality and efficiency of training data generation
Highlights the potential of this approach to advance medical image analysis and facilitate more accurate diagnoses

Plain English Explanation

The paper presents a novel approach that combines artificial intelligence (AI) and citizen science to create more robust training datasets for medical image segmentation. Medical image segmentation is the process of dividing an image, such as an X-ray or CT scan, into meaningful regions or structures, which is crucial for accurate diagnosis and treatment planning.

Traditionally, creating high-quality training data for medical image segmentation models has been a labor-intensive and time-consuming task, often requiring expert manual annotation. The researchers in this study propose a solution that leverages the power of AI and the collective intelligence of citizen scientists to streamline the data annotation process.

The key idea is to use an AI model to first segment medical images automatically, then have citizen scientists (members of the public) review and refine the AI-generated annotations. This approach allows for faster data generation while maintaining the accuracy and attention to detail that human annotators can provide. By coupling AI and citizen science, the researchers demonstrate how to create more comprehensive and reliable training datasets, ultimately leading to more accurate medical image analysis and improved patient outcomes.

Technical Explanation

The researchers developed a two-stage workflow that integrates AI and citizen science for medical image segmentation. In the first stage, an AI-based segmentation model is used to automatically annotate medical images, such as CT scans or MRI scans. This initial model is trained on a smaller, high-quality dataset curated by experts.

In the second stage, the AI-generated annotations are presented to citizen scientists through a crowdsourcing platform. These citizen scientists, who are members of the public with no specialized medical training, review and refine the annotations, providing additional detail and correcting any errors made by the AI model. The refined annotations from the citizen scientists are then used to further train and improve the AI segmentation model, creating an enhanced training dataset for medical image analysis.

The researchers evaluated their approach on several medical imaging datasets, including brain MRI scans and chest X-rays. They found that the combined AI and citizen science approach resulted in more accurate segmentation models compared to using either approach alone or manual expert annotation. The crowdsourced refinements from citizen scientists helped to address the limitations of the initial AI model and improve the overall quality of the training data.

Critical Analysis

The researchers acknowledge that their approach relies on the availability of a sufficiently large pool of engaged citizen scientists, which may not always be the case. Additionally, they note that the quality of the citizen-provided annotations can vary, and mechanisms for identifying and addressing low-quality contributions are necessary.

Another potential limitation is the need for careful task design and clear instructions to ensure that citizen scientists understand the medical context and can provide meaningful refinements to the AI-generated annotations. The researchers suggest that additional training or guidance may be required to help citizen scientists become more effective annotators.

While the results demonstrate the potential of this approach, further research is needed to explore its applicability across a wider range of medical imaging modalities and disease conditions. Additionally, the long-term sustainability and scalability of the citizen science platform would need to be addressed to ensure the continued availability of high-quality training data for medical image segmentation models.

Conclusion

This research paper presents a novel approach that leverages the complementary strengths of AI and citizen science to enhance the creation of training datasets for medical image segmentation. By combining automated annotation and human refinement, the researchers have shown how to generate more comprehensive and accurate training data, leading to improved performance of medical image analysis models.

This work has the potential to significantly impact the field of medical imaging, as more accurate segmentation can facilitate earlier disease detection, more precise treatment planning, and ultimately, better patient outcomes. The successful implementation of this approach could pave the way for more widespread adoption of AI-powered tools in the medical domain, with citizen scientists playing a crucial role in advancing the state of the art.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation

Amir Syahmi, Xiangrong Lu, Yinxuan Li, Haoxuan Yao, Hanjun Jiang, Ishita Acharya, Shiyi Wang, Yang Nan, Xiaodan Xing, Guang Yang

Recent advancements in medical imaging and artificial intelligence (AI) have greatly enhanced diagnostic capabilities, but the development of effective deep learning (DL) models is still constrained by the lack of high-quality annotated datasets. The traditional manual annotation process by medical experts is time- and resource-intensive, limiting the scalability of these datasets. In this work, we introduce a robust and versatile framework that combines AI and crowdsourcing to improve both the quality and quantity of medical image datasets across different modalities. Our approach utilises a user-friendly online platform that enables a diverse group of crowd annotators to label medical images efficiently. By integrating the MedSAM segmentation AI with this platform, we accelerate the annotation process while maintaining expert-level quality through an algorithm that merges crowd-labelled images. Additionally, we employ pix2pixGAN, a generative AI model, to expand the training dataset with synthetic images that capture realistic morphological features. These methods are combined into a cohesive framework designed to produce an enhanced dataset, which can serve as a universal pre-processing pipeline to boost the training of any medical deep learning segmentation model. Our results demonstrate that this framework significantly improves model performance, especially when training data is limited.

9/6/2024

Rule-based outlier detection of AI-generated anatomy segmentations

Deepa Krishnaswamy, Vamsi Krishna Thiriveedhi, Cosmin Ciausu, David Clunie, Steve Pieper, Ron Kikinis, Andrey Fedorov

There is a dire need for medical imaging datasets with accompanying annotations to perform downstream patient analysis. However, it is difficult to manually generate these annotations, due to the time-consuming nature, and the variability in clinical conventions. Artificial intelligence has been adopted in the field as a potential method to annotate these large datasets, however, a lack of expert annotations or ground truth can inhibit the adoption of these annotations. We recently made a dataset publicly available including annotations and extracted features of up to 104 organs for the National Lung Screening Trial using the TotalSegmentator method. However, the released dataset does not include expert-derived annotations or an assessment of the accuracy of the segmentations, limiting its usefulness. We propose the development of heuristics to assess the quality of the segmentations, providing methods to measure the consistency of the annotations and a comparison of results to the literature. We make our code and related materials publicly available at https://github.com/ImagingDataCommons/CloudSegmentatorResults and interactive tools at https://huggingface.co/spaces/ImagingDataCommons/CloudSegmentatorResults.

6/21/2024

Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes

Li Zhang, Basu Jindal, Ahmed Alaa, Robert Weinreb, David Wilson, Eran Segal, James Zou, Pengtao Xie

Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images are extremely limited, posing significant challenges for the generalization of conventional deep learning methods on test images. To address this, we introduce a generative deep learning framework, which uniquely generates high-quality paired segmentation masks and medical images, serving as auxiliary data for training robust models in data-scarce environments. Unlike traditional generative models that treat data generation and segmentation model training as separate processes, our method employs multi-level optimization for end-to-end data generation. This approach allows segmentation performance to directly influence the data generation process, ensuring that the generated data is specifically tailored to enhance the performance of the segmentation model. Our method demonstrated strong generalization performance across 9 diverse medical image segmentation tasks and on 16 datasets, in ultra-low data regimes, spanning various diseases, organs, and imaging modalities. When applied to various segmentation models, it achieved performance improvements of 10-20% (absolute), in both same-domain and out-of-domain scenarios. Notably, it requires 8 to 20 times less training data than existing methods to achieve comparable results. This advancement significantly improves the feasibility and cost-effectiveness of applying deep learning in medical imaging, particularly in scenarios with limited data availability.

9/2/2024

🚀

Full-Scale Indexing and Semantic Annotation of CT Imaging: Boosting FAIRness

Hannes Ulrich, Robin Hendel, Santiago Pazmino, Bjorn Bergh, Bjorn Schreiweis

Background: The integration of artificial intelligence into medicine has led to significant advances, particularly in diagnostics and treatment planning. However, the reliability of AI models is highly dependent on the quality of the training data, especially in medical imaging, where varying patient data and evolving medical knowledge pose a challenge to the accuracy and generalizability of given datasets. Results: The proposed approach focuses on the integration and enhancement of clinical computed tomography (CT) image series for better findability, accessibility, interoperability, and reusability. Through an automated indexing process, CT image series are semantically enhanced using the TotalSegmentator framework for segmentation and resulting SNOMED CT annotations. The metadata is standardized with HL7 FHIR resources to enable efficient data recognition and data exchange between research projects. Conclusions: The study successfully integrates a robust process within the UKSH MeDIC, leading to the semantic enrichment of over 230,000 CT image series and over 8 million SNOMED CT annotations. The standardized representation using HL7 FHIR resources improves discoverability and facilitates interoperability, providing a foundation for the FAIRness of medical imaging data. However, developing automated annotation methods that can keep pace with growing clinical datasets remains a challenge to ensure continued progress in large-scale integration and indexing of medical imaging for advanced healthcare AI applications.

6/24/2024