Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics

2403.14362

Published 5/24/2024 by Jiaqi Yue, Jiancheng Zhao, Chunhui Zhao

Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics

Abstract

Generalized zero-shot learning (GZSL) focuses on recognizing seen and unseen classes against domain shift problem (DSP) where data of unseen classes may be misclassified as seen classes. However, existing GZSL is still limited to seen domains. In the current work, we pioneer cross-domain GZSL (CDGZSL) which addresses GZSL towards unseen domains. Different from existing GZSL methods which alleviate DSP by generating features of unseen classes with semantics, CDGZSL needs to construct a common feature space across domains and acquire the corresponding intrinsic semantics shared among domains to transfer from seen to unseen domains. Considering the information asymmetry problem caused by redundant class semantics annotated with large language models (LLMs), we present Meta Domain Alignment Semantic Refinement (MDASR). Technically, MDASR consists of two parts: Inter-class Similarity Alignment (ISA), which eliminates the non-intrinsic semantics not shared across all domains under the guidance of inter-class feature relationships, and Unseen-class Meta Generation (UMG), which preserves intrinsic semantics to maintain connectivity between seen and unseen classes by simulating feature generation. MDASR effectively aligns the redundant semantic space with the common feature space, mitigating the information asymmetry in CDGZSL. The effectiveness of MDASR is demonstrated on the Office-Home and Mini-DomainNet, and we have shared the LLM-based semantics for these datasets as the benchmark.

Create account to get full access

Overview

Proposes a novel approach for generalized zero-shot learning towards unseen domains
Leverages the redundant semantics in large language models (LLMs) to enable intrinsic learning for better generalization
Addresses the challenge of information asymmetry between seen and unseen domains

Plain English Explanation

This research paper presents a new way to enable generalized zero-shot learning - the ability to apply machine learning models to new, unseen domains without any additional training. The key insight is that large language models (LLMs) like GPT-3 contain a wealth of redundant semantic information that can be leveraged to help the model better generalize to new, unfamiliar domains.

The core idea is to extract and refine this semantic information intrinsically within the model, without relying solely on the limited labeled data available for the seen domains. This helps address the information asymmetry problem, where the model has much more knowledge about the seen domains compared to the unseen ones.

By tapping into the rich and diverse semantics embedded in LLMs, the researchers show that models can learn more general and transferable representations, enabling better performance on unseen domains compared to traditional zero-shot learning approaches. This is a significant advancement, as the ability to apply AI models to new, previously unseen scenarios is crucial for real-world deployment and broader impact.

Technical Explanation

The paper introduces a novel framework called Less but Better (LBB) that leverages the redundant semantics in LLMs to enable more generalized zero-shot learning. The key components are:

Semantic Refinement: The researchers propose a method to extract and refine the semantic representations from the LLM in an intrinsic manner, without relying solely on the limited labeled data available for the seen domains. This helps address the information asymmetry problem.
Semantic Projection: The refined semantic representations are then projected onto a shared semantic space, enabling better alignment between seen and unseen domains.
Semantic-Guided Learning: The model is trained to learn from the semantic representations in addition to the limited labeled data, resulting in more transferable and generalizable knowledge.

The researchers evaluate their approach on several generalized zero-shot learning benchmarks, including Towards Generalizing to Unseen Domains, Visual Augmented Dynamic Semantic Prototype Generative Zero, and UniGen: Universal Domain Generalization for Sentiment Classification. The results demonstrate significant performance improvements over existing zero-shot learning methods, highlighting the effectiveness of their intrinsic learning approach.

Critical Analysis

The paper presents a compelling solution to the challenge of generalized zero-shot learning, which is a crucial capability for real-world deployment of AI systems. By leveraging the redundant semantics in LLMs, the researchers have found a way to bridge the information gap between seen and unseen domains, enabling more transferable and generalizable knowledge.

However, the paper does not address the potential limitations of their approach. For example, the dependency on LLMs may limit the applicability of the method to scenarios where such models are not available or feasible to use. Additionally, the scalability of the semantic refinement and projection processes as the number of domains grows could be an area for further investigation.

Furthermore, the paper does not explore the ethical implications of using LLMs, which are known to have biases and safety concerns. Addressing these issues, perhaps through Joint Semi-Supervised Contrastive Learning or Unified Language-Driven Zero-Shot Domain Adaptation, could be a valuable direction for future research.

Conclusion

The Less but Better framework proposed in this paper represents a significant advancement in generalized zero-shot learning, a crucial capability for the widespread deployment of AI systems. By leveraging the redundant semantics in LLMs and enabling intrinsic learning, the researchers have found a way to bridge the information gap between seen and unseen domains, leading to more transferable and generalizable knowledge.

While the paper presents a promising solution, further research is needed to address potential limitations and explore the ethical implications of using LLMs. Overall, this work demonstrates the power of intrinsic learning and the potential of semantically-guided approaches to advance the field of zero-shot learning and enable AI systems to adapt to new, previously unseen scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Towards Generalizing to Unseen Domains with Few Labels

Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana, Muhammad Haris Khan

We approach the challenge of addressing semi-supervised domain generalization (SSDG). Specifically, our aim is to obtain a model that learns domain-generalizable features by leveraging a limited subset of labelled data alongside a substantially larger pool of unlabeled data. Existing domain generalization (DG) methods which are unable to exploit unlabeled data perform poorly compared to semi-supervised learning (SSL) methods under SSDG setting. Nevertheless, SSL methods have considerable room for performance improvement when compared to fully-supervised DG training. To tackle this underexplored, yet highly practical problem of SSDG, we make the following core contributions. First, we propose a feature-based conformity technique that matches the posterior distributions from the feature space with the pseudo-label from the model's output space. Second, we develop a semantics alignment loss to learn semantically-compatible representations by regularizing the semantic structure in the feature space. Our method is plug-and-play and can be readily integrated with different SSL-based SSDG baselines without introducing any additional parameters. Extensive experimental results across five challenging DG benchmarks with four strong SSL baselines suggest that our method provides consistent and notable gains in two different SSDG settings.

5/8/2024

cs.CV

👁️

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor generalizations (textit{e.g.}, overfitting to seen classes). To address this issue, we propose a novel Visual-Augmented Dynamic Semantic prototype method (termed VADS) to boost the generator to learn accurate semantic-visual mapping by fully exploiting the visual-augmented knowledge into semantic conditions. In detail, VADS consists of two modules: (1) Visual-aware Domain Knowledge Learning module (VDKL) learns the local bias and global prior of the visual features (referred to as domain visual knowledge), which replace pure Gaussian noise to provide richer prior noise information; (2) Vision-Oriented Semantic Updation module (VOSU) updates the semantic prototype according to the visual representations of the samples. Ultimately, we concatenate their output as a dynamic semantic prototype, which serves as the condition of the generator. Extensive experiments demonstrate that our VADS achieves superior CZSL and GZSL performances on three prominent datasets and outperforms other state-of-the-art methods with averaging increases by 6.4%, 5.9% and 4.2% on SUN, CUB and AWA2, respectively.

4/24/2024

cs.CV

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis

Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg

5/30/2024

cs.CV cs.LG

📊

Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models

Zihan Ye, Shreyank N. Gowda, Xiaobo Jin, Xiaowei Huang, Haotian Xu, Yaochu Jin, Kaizhu Huang

Zero-Shot Learning (ZSL) aims to enable classifiers to identify unseen classes by enhancing data efficiency at the class level. This is achieved by generating image features from pre-defined semantics of unseen classes. However, most current approaches heavily depend on the number of samples from seen classes, i.e. they do not consider instance-level effectiveness. In this paper, we demonstrate that limited seen examples generally result in deteriorated performance of generative models. To overcome these challenges, we propose ZeroDiff, a Diffusion-based Generative ZSL model. This unified framework incorporates diffusion models to improve data efficiency at both the class and instance levels. Specifically, for instance-level effectiveness, ZeroDiff utilizes a forward diffusion chain to transform limited data into an expanded set of noised data. For class-level effectiveness, we design a two-branch generation structure that consists of a Diffusion-based Feature Generator (DFG) and a Diffusion-based Representation Generator (DRG). DFG focuses on learning and sampling the distribution of cross-entropy-based features, whilst DRG learns the supervised contrastive-based representation to boost the zero-shot capabilities of DFG. Additionally, we employ three discriminators to evaluate generated features from various aspects and introduce a Wasserstein-distance-based mutual learning loss to transfer knowledge among discriminators, thereby enhancing guidance for generation. Demonstrated through extensive experiments on three popular ZSL benchmarks, our ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Code will be released upon acceptance.

6/6/2024

cs.CV cs.LG