StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization






Published 6/4/2024 by Songhua Liu, Xin Jin, Xingyi Yang, Jingwen Ye, Xinchao Wang
StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization


Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain, making it a highly ambitious and challenging task. State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data and thus increase robustness. Nevertheless, they have largely overlooked the underlying coherence between the augmented domains, which in turn leads to inferior results in real-world scenarios. In this paper, we propose a simple yet effective scheme, termed as emph{StyDeSty}, to explicitly account for the alignment of the source and pseudo domains in the process of data augmentation, enabling them to interact with each other in a self-consistent manner and further giving rise to a latent domain with strong generalization power. The heart of StyDeSty lies in the interaction between a emph{stylization} module for generating novel stylized samples using the source domain, and a emph{destylization} module for transferring stylized and source samples to a latent domain to learn content-invariant features. The stylization and destylization modules work adversarially and reinforce each other. During inference, the destylization module transforms the input sample with an arbitrary style shift to the latent domain, in which the downstream tasks are carried out. Specifically, the location of the destylization layer within the backbone network is determined by a dedicated neural architecture search (NAS) strategy. We evaluate StyDeSty on multiple benchmarks and demonstrate that it yields encouraging results, outperforming the state of the art by up to {13.44%} on classification accuracy. Codes are available here:

Create account to get full access


If you already have an account, we'll log you in


ā€¢ This paper presents StyDeSty, a novel approach for single domain generalization that combines style stylization and destylization. ā€¢ StyDeSty aims to address the challenge of domain generalization, where a model trained on one domain must perform well on unseen domains. ā€¢ The key ideas are to learn a min-max optimization between stylization and destylization, enabling the model to handle diverse styles while maintaining core task performance.

Plain English Explanation

StyDeSty is a machine learning technique that helps models perform well on new, unseen data domains, even if they were only trained on a single domain originally. The main insight is to have the model learn both how to "stylize" an image (i.e., apply artistic styles to it) and how to "destylize" it (i.e., remove any stylistic elements). By training the model to handle this back-and-forth between stylization and destylization, it becomes more robust and can generalize better to new domains it hasn't seen before.

This is important because in many real-world applications, we want machine learning models to work reliably across a variety of settings, not just the specific ones they were trained on. StyDeSty provides a way to achieve this "domain generalization" - the ability to handle diverse data distributions without requiring retraining or fine-tuning on each new domain. The key is this interplay between stylization and destylization, which forces the model to learn representations that are flexible enough to handle different styles and visual characteristics.

Technical Explanation

StyDeSty builds on prior work in domain generalization and style transfer. The core idea is to learn a min-max optimization between a stylization network and a destylization network. The stylization network tries to apply artistic styles to the input, while the destylization network tries to remove those styles, restoring the core content.

This adversarial training process encourages the model to learn representations that can handle diverse styles while still maintaining performance on the target task, such as image classification or segmentation. Experiments show StyDeSty outperforms prior domain generalization approaches on benchmarks like DGInstStyle and PracticalDG, demonstrating its effectiveness for single-domain generalization.

Critical Analysis

The authors acknowledge that StyDeSty may not be optimal for domains with extreme distributional shifts, as the stylization-destylization tradeoff has limits. Additionally, the computational overhead of the min-max optimization could be a practical challenge for real-world deployment. Further research is needed to explore more efficient training procedures and to understand the model's robustness to larger domain gaps.

That said, the core insight of leveraging style transfer as a means of domain generalization is compelling and aligns with human intuitions about how we adapt to novel situations. By learning to recognize and manipulate stylistic elements, the model builds more flexible representations that can generalize beyond the training distribution. Extending these principles to other domains beyond computer vision could be a fruitful direction for future work.


StyDeSty introduces a novel approach to single-domain generalization that combines stylization and destylization in a min-max optimization framework. By learning to handle diverse styles while maintaining task performance, the model can generalize more effectively to new, unseen data domains. While there are some practical limitations, the core ideas behind StyDeSty represent an exciting step forward in building more robust and adaptable machine learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Complex Style Image Transformations for Domain Generalization in Medical Images

Complex Style Image Transformations for Domain Generalization in Medical Images

Nikolaos Spanos, Anastasios Arsenos, Paraskevi-Antonia Theofilou, Paraskevi Tzouveli, Athanasios Voulodimos, Stefanos Kollias





The absence of well-structured large datasets in medical computer vision results in decreased performance of automated systems and, especially, of deep learning models. Domain generalization techniques aim to approach unknown domains from a single data source. In this paper we introduce a novel framework, named CompStyle, which leverages style transfer and adversarial training, along with high-level input complexity augmentation to effectively expand the domain space and address unknown distributions. State-of-the-art style transfer methods depend on the existence of subdomains within the source dataset. However, this can lead to an inherent dataset bias in the image creation. Input-level augmentation can provide a solution to this problem by widening the domain space in the source dataset and boost performance on out-of-domain distributions. We provide results from experiments on semantic segmentation on prostate data and corruption robustness on cardiac data which demonstrate the effectiveness of our approach. Our method increases performance in both tasks, without added cost to training time or resources.

Read more


Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images

Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images

Yiran Luo, Joshua Feinglass, Tejas Gokhale, Kuan-Cheng Lee, Chitta Baral, Yezhou Yang





Domain Generalization (DG) is a challenging task in machine learning that requires a coherent ability to comprehend shifts across various domains through extraction of domain-invariant features. DG performance is typically evaluated by performing image classification in domains of various image styles. However, current methodology lacks quantitative understanding about shifts in stylistic domain, and relies on a vast amount of pre-training data, such as ImageNet1K, which are predominantly in photo-realistic style with weakly supervised class labels. Such a data-driven practice could potentially result in spurious correlation and inflated performance on DG benchmarks. In this paper, we introduce a new DG paradigm to address these risks. We first introduce two new quantitative measures ICV and IDD to describe domain shifts in terms of consistency of classes within one domain and similarity between two stylistic domains. We then present SuperMarioDomains (SMD), a novel synthetic multi-domain dataset sampled from video game scenes with more consistent classes and sufficient dissimilarity compared to ImageNet1K. We demonstrate our DG method SMOS. SMOS first uses SMD to train a precursor model, which is then used to ground the training on a DG benchmark. We observe that SMOS contributes to state-of-the-art performance across five DG benchmarks, gaining large improvements to performances on abstract domains along with on-par or slight improvements to those on photo-realistic domains. Our qualitative analysis suggests that these improvements can be attributed to reduced distributional divergence between originally distant domains. Our data are available at .

Read more


DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov





Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding yes. We propose an efficient data generation pipeline termed DGInStyle. First, we examine the problem of specializing a pretrained LDM to semantically-controlled generation within a narrow domain. Second, we propose a Style Swap technique to endow the rich generative prior with the learned semantic control. Third, we design a Multi-resolution Latent Fusion technique to overcome the bias of LDMs towards dominant objects. Using DGInStyle, we generate a diverse dataset of street scenes, train a domain-agnostic semantic segmentation model on it, and evaluate the model on multiple popular autonomous driving datasets. Our approach consistently increases the performance of several domain generalization methods compared to the previous state-of-the-art methods. Source code and dataset are available at

Read more


Causality-inspired Latent Feature Augmentation for Single Domain Generalization

Causality-inspired Latent Feature Augmentation for Single Domain Generalization

Jian Xu, Chaojie Ji, Yankai Cao, Ye Li, Ruxin Wang





Single domain generalization (Single-DG) intends to develop a generalizable model with only one single training domain to perform well on other unknown target domains. Under the domain-hungry configuration, how to expand the coverage of source domain and find intrinsic causal features across different distributions is the key to enhancing the models' generalization ability. Existing methods mainly depend on the meticulous design of finite image-level transformation techniques and learning invariant features across domains based on statistical correlation between samples and labels in source domain. This makes it difficult to capture stable semantics between source and target domains, which hinders the improvement of the model's generalization performance. In this paper, we propose a novel causality-inspired latent feature augmentation method for Single-DG by learning the meta-knowledge of feature-level transformation based on causal learning and interventions. Instead of strongly relying on the finite image-level transformation, with the learned meta-knowledge, we can generate diverse implicit feature-level transformations in latent space based on the consistency of causal features and diversity of non-causal features, which can better compensate for the domain-hungry defect and reduce the strong reliance on initial finite image-level transformations and capture more stable domain-invariant causal features for generalization. Extensive experiments on several open-access benchmarks demonstrate the outstanding performance of our model over other state-of-the-art single domain generalization and also multi-source domain generalization methods.

Read more
