Cross-modal tumor segmentation using generative blending augmentation and self training






Published 4/1/2024 by Guillaume Sall'e, Pierre-Henri Conze, Julien Bert, Nicolas Boussion, Dimitris Visvikis, Vincent Jaouen



textit{Objectives}: Data scarcity and domain shifts lead to biased training sets that do not accurately represent deployment conditions. A related practical problem is cross-modal image segmentation, where the objective is to segment unlabelled images using previously labelled datasets from other imaging modalities. textit{Methods}: We propose a cross-modal segmentation method based on conventional image synthesis boosted by a new data augmentation technique called Generative Blending Augmentation (GBA). GBA leverages a SinGAN model to learn representative generative features from a single training image to diversify realistically tumor appearances. This way, we compensate for image synthesis errors, subsequently improving the generalization power of a downstream segmentation model. The proposed augmentation is further combined to an iterative self-training procedure leveraging pseudo labels at each pass. textit{Results}: The proposed solution ranked first for vestibular schwannoma (VS) segmentation during the validation and test phases of the MICCAI CrossMoDA 2022 challenge, with best mean Dice similarity and average symmetric surface distance measures. textit{Conclusion and significance}: Local contrast alteration of tumor appearances and iterative self-training with pseudo labels are likely to lead to performance improvements in a variety of segmentation contexts.

Create account to get full access


If you already have an account, we'll log you in


  • Data scarcity and domain shifts can lead to biased training sets that don't accurately represent real-world conditions
  • The researchers address the practical problem of cross-modal image segmentation, where the goal is to segment unlabelled images using previously labelled datasets from other imaging modalities
  • The researchers propose a cross-modal segmentation method that uses a new data augmentation technique called Generative Blending Augmentation (GBA) to improve the generalization of the segmentation model

Plain English Explanation

Training machine learning models often relies on large, high-quality datasets. However, in many real-world scenarios, these datasets may be scarce or come from conditions that differ from the actual deployment environment. This can cause the models to learn biases and fail to perform well when faced with new, unfamiliar data.

The researchers tackle this challenge in the context of medical image segmentation, where the goal is to identify and outline specific structures (like tumors) in images. They explore the scenario where you have labelled images from one type of medical imaging (like MRI), but you want to use that information to segment images from a different type of imaging (like CT scans). This "cross-modal" segmentation task is difficult because the appearance of the structures can vary significantly between imaging modalities.

To address this, the researchers developed a new data augmentation technique called Generative Blending Augmentation (GBA). GBA uses a machine learning model called SinGAN to learn the visual features of a single training image and then generate new, realistic-looking variations of that image. This helps compensate for errors that can occur when synthesizing images, and ultimately improves the performance of the segmentation model.

The researchers also combine GBA with an iterative self-training approach, where the segmentation model is repeatedly fine-tuned using its own predictions on new unlabelled images. This helps the model continuously learn and adapt to the target domain.

Technical Explanation

The researchers propose a cross-modal segmentation method that leverages a new data augmentation technique called Generative Blending Augmentation (GBA). GBA uses a SinGAN model to learn representative generative features from a single training image, which are then used to create realistic variations of that image. This helps compensate for potential errors in the image synthesis process and improves the generalization of the downstream segmentation model.

The proposed method starts by training a SinGAN model on a single annotated training image from the source modality (e.g., MRI). This allows the model to learn the inherent visual features and structure of the target object (e.g., a tumor). The SinGAN model is then used to generate new, diverse synthetic images that maintain the overall characteristics of the original training image but exhibit variations in appearance, texture, and contrast.

These synthetic images are blended with the original training image using a weighted combination, creating the GBA augmented dataset. This augmented dataset is used to train the segmentation model, which is further fine-tuned through an iterative self-training process. In each iteration, the segmentation model is used to generate pseudo-labels for unlabelled target domain images, and these pseudo-labels are then used to update the model.

The researchers evaluated their method on the task of vestibular schwannoma (VS) segmentation as part of the MICCAI CrossMoDA 2022 challenge. Their solution ranked first in both the validation and test phases, achieving the best mean Dice similarity and average symmetric surface distance measures.

Critical Analysis

The researchers' approach of using GBA and iterative self-training shows promising results for cross-modal image segmentation, particularly in the context of medical imaging. By compensating for image synthesis errors and continuously adapting the model to the target domain, the method appears to overcome some of the challenges posed by data scarcity and domain shifts.

However, the paper does not provide a detailed analysis of the method's limitations or potential failure cases. It would be valuable to understand how the performance of the approach might degrade with larger domain shifts, more complex target structures, or limited availability of source domain data. Additionally, the paper does not discuss potential biases that could be introduced by the SinGAN model or the self-training process, which could be an important consideration for sensitive applications like medical diagnostics.

Further research could explore the generalizability of the GBA and self-training approach to other cross-modal segmentation tasks, as well as investigate ways to make the method more robust and transparent. Incorporating techniques for detecting and mitigating biases in the generated data and model outputs could also enhance the reliability and trustworthiness of the system.


The researchers have developed an innovative cross-modal segmentation method that leverages Generative Blending Augmentation and iterative self-training to address the challenges of data scarcity and domain shifts. By compensating for image synthesis errors and continuously adapting the model to the target domain, their approach has demonstrated strong performance on the task of vestibular schwannoma segmentation.

While the results are promising, further research is needed to fully understand the limitations and potential biases of the method. Exploring its generalizability to other cross-modal tasks and incorporating bias detection and mitigation techniques could help unlock the broader applicability of this approach in real-world medical imaging and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Self-supervised Brain Lesion Generation for Effective Data Augmentation of Medical Images

Self-supervised Brain Lesion Generation for Effective Data Augmentation of Medical Images

Jiayu Huo, Sebastien Ourselin, Rachel Sparks





Accurate brain lesion delineation is important for planning neurosurgical treatment. Automatic brain lesion segmentation methods based on convolutional neural networks have demonstrated remarkable performance. However, neural network performance is constrained by the lack of large-scale well-annotated training datasets. In this manuscript, we propose a comprehensive framework to efficiently generate new, realistic samples for training a brain lesion segmentation model. We first train a lesion generator, based on an adversarial autoencoder, in a self-supervised manner. Next, we utilize a novel image composition algorithm, Soft Poisson Blending, to seamlessly combine synthetic lesions and brain images to obtain training samples. Finally, to effectively train the brain lesion segmentation model with augmented images we introduce a new prototype consistence regularization to align real and synthetic features. Our framework is validated by extensive experiments on two public brain lesion segmentation datasets: ATLAS v2.0 and Shift MS. Our method outperforms existing brain image data augmentation schemes. For instance, our method improves the Dice from 50.36% to 60.23% compared to the U-Net with conventional data augmentation techniques for the ATLAS v2.0 dataset.

Read more


Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

Weide Liu, Jingwen Hou, Xiaoyang Zhong, Huijing Zhan, Jun Cheng, Yuming Fang, Guanghui Yue





Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fusing accessible multi-modal features, leveraging attention mechanisms, and synthesizing missing modalities using generative models. However, these methods ignore the intrinsic problems of medical image segmentation, such as the limited availability of training samples, particularly for cases with tumors. Furthermore, these methods require training and deploying a specific model for each subset of missing modalities. To address these issues, we propose a novel approach that enhances the BTS model from two perspectives. Firstly, we introduce a pre-training stage that generates a diverse pre-training dataset covering a wide range of different combinations of tumor shapes and brain anatomy. Secondly, we propose a post-training stage that enables the model to reconstruct missing modalities in the prediction results when only partial modalities are available. To achieve the pre-training stage, we conceptually decouple the MRI image into two parts: `anatomy' and `tumor'. We pre-train the BTS model using synthesized data generated from the anatomy and tumor parts across different training samples. ... Extensive experiments demonstrate that our proposed method significantly improves the performance over the baseline and achieves new state-of-the-art results on three brain tumor segmentation datasets: BRATS2020, BRATS2018, and BRATS2015.

Read more


Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Zihao Wang, Yingyu Yang, Yuzhou Chen, Tingting Yuan, Maxime Sermesant, Herve Delingette, Ona Wu





Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality. Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality segmentation. However, a vast body of existing cross-modality image translation methods relies on supervised learning. In this work, we aim to address the challenge of zero-shot learning-based image translation tasks (extreme scenarios in the target modality is unseen in the training phase). To leverage generative learning for zero-shot cross-modality image segmentation, we propose a novel unsupervised image translation method. The framework learns to translate the unseen source image to the target modality for image segmentation by leveraging the inherent statistical consistency between different modalities for diffusion guidance. Our framework captures identical cross-modality features in the statistical domain, offering diffusion guidance without relying on direct mappings between the source and target domains. This advantage allows our method to adapt to changing source domains without the need for retraining, making it highly practical when sufficient labeled source domain data is not available. The proposed framework is validated in zero-shot cross-modality image segmentation tasks through empirical comparisons with influential generative models, including adversarial-based and diffusion-based models.

Read more


GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification

GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification

Hansang Lee, Haeil Lee, Helen Hong





In this paper, we propose a novel data augmentation technique called GenMix, which combines generative and mixture approaches to leverage the strengths of both methods. While generative models excel at creating new data patterns, they face challenges such as mode collapse in GANs and difficulties in training diffusion models, especially with limited medical imaging data. On the other hand, mixture models enhance class boundary regions but tend to favor the major class in scenarios with class imbalance. To address these limitations, GenMix integrates both approaches to complement each other. GenMix operates in two stages: (1) training a generative model to produce synthetic images, and (2) performing mixup between synthetic and real data. This process improves the quality and diversity of synthetic data while simultaneously benefiting from the new pattern learning of generative models and the boundary enhancement of mixture models. We validate the effectiveness of our method on the task of classifying focal liver lesions (FLLs) in CT images. Our results demonstrate that GenMix enhances the performance of various generative models, including DCGAN, StyleGAN, Textual Inversion, and Diffusion Models. Notably, the proposed method with Textual Inversion outperforms other methods without fine-tuning diffusion model on the FLL dataset.

Read more
