Bridging the Projection Gap: Overcoming Projection Bias Through Parameterized Distance Learning

2309.01390

YC

0

Reddit

0

Published 4/3/2024 by Chong Zhang, Mingyu Jin, Qinkai Yu, Haochen Xue, Shreyank N Gowda, Xiaobo Jin

↗️

Abstract

Generalized zero-shot learning (GZSL) aims to recognize samples from both seen and unseen classes using only seen class samples for training. However, GZSL methods are prone to bias towards seen classes during inference due to the projection function being learned from seen classes. Most methods focus on learning an accurate projection, but bias in the projection is inevitable. We address this projection bias by proposing to learn a parameterized Mahalanobis distance metric for robust inference. Our key insight is that the distance computation during inference is critical, even with a biased projection. We make two main contributions - (1) We extend the VAEGAN (Variational Autoencoder & Generative Adversarial Networks) architecture with two branches to separately output the projection of samples from seen and unseen classes, enabling more robust distance learning. (2) We introduce a novel loss function to optimize the Mahalanobis distance representation and reduce projection bias. Extensive experiments on four datasets show that our approach outperforms state-of-the-art GZSL techniques with improvements of up to 3.5 % on the harmonic mean metric.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • Generalized zero-shot learning (GZSL) aims to recognize samples from both seen and unseen classes using only seen class samples for training.
  • GZSL methods often suffer from bias towards seen classes due to the projection function being learned from seen classes.
  • Most GZSL methods focus on learning an accurate projection, but bias in the projection is inevitable.

Plain English Explanation

Imagine you're a teacher who wants to teach students how to recognize different types of animals. Normally, you'd show the students examples of the animals and let them practice identifying them. But what if you wanted the students to also be able to recognize animals they've never seen before?

That's the challenge addressed in generalized zero-shot learning (GZSL). The goal is to train a system to recognize samples (like animal photos) from both "seen" classes (animals the system has been trained on) and "unseen" classes (new animals the system hasn't seen before).

The problem is, the system tends to be biased towards the seen classes, because the way it projects the samples into a representation space is learned only from the seen class examples. Even if the system can accurately project the seen class samples, this projection will still be biased.

The researchers in this paper propose a new approach to address this projection bias. Their key insight is that the way the distances between samples are computed during inference is critical, even if the projection itself is biased.

Technical Explanation

The researchers' main contributions are:

  1. They extend the VAEGAN (Variational Autoencoder & Generative Adversarial Networks) architecture to separately output the projection of samples from seen and unseen classes. This enables more robust distance learning.

  2. They introduce a novel loss function to optimize the Mahalanobis distance representation and reduce projection bias.

The Mahalanobis distance is a way of measuring the distance between a point and a distribution, taking into account the correlation between dimensions. By learning a parameterized Mahalanobis distance metric, the system can more robustly handle the biased projection from the seen class samples.

The researchers evaluate their approach on four datasets and show it outperforms state-of-the-art GZSL techniques, with improvements of up to 3.5% on the harmonic mean metric.

Critical Analysis

The paper acknowledges that while their approach helps mitigate projection bias, some bias may still be present in the learned distance metric. Further research could explore ways to reduce this bias even further.

Additionally, the experiments are limited to standard GZSL benchmark datasets. It would be valuable to see how the approach performs on more diverse, real-world datasets with greater class imbalance and other challenges.

Overall, the researchers present a promising technique for addressing a key limitation of GZSL methods. By shifting the focus to robust distance computation, they've made an important contribution to this active area of research.

Conclusion

This paper introduces a novel approach to generalized zero-shot learning that tackles the issue of projection bias. By learning a parameterized Mahalanobis distance metric, the system can more effectively recognize samples from both seen and unseen classes, even with a biased projection. The results demonstrate significant improvements over state-of-the-art GZSL methods, suggesting this technique could have a meaningful impact on applications that require recognizing novel concepts.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics

Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics

Jiaqi Yue, Jiancheng Zhao, Chunhui Zhao

YC

0

Reddit

0

Generalized zero-shot learning (GZSL) focuses on recognizing seen and unseen classes against domain shift problem (DSP) where data of unseen classes may be misclassified as seen classes. However, existing GZSL is still limited to seen domains. In the current work, we pioneer cross-domain GZSL (CDGZSL) which addresses GZSL towards unseen domains. Different from existing GZSL methods which alleviate DSP by generating features of unseen classes with semantics, CDGZSL needs to construct a common feature space across domains and acquire the corresponding intrinsic semantics shared among domains to transfer from seen to unseen domains. Considering the information asymmetry problem caused by redundant class semantics annotated with large language models (LLMs), we present Meta Domain Alignment Semantic Refinement (MDASR). Technically, MDASR consists of two parts: Inter-class Similarity Alignment (ISA), which eliminates the non-intrinsic semantics not shared across all domains under the guidance of inter-class feature relationships, and Unseen-class Meta Generation (UMG), which preserves intrinsic semantics to maintain connectivity between seen and unseen classes by simulating feature generation. MDASR effectively aligns the redundant semantic space with the common feature space, mitigating the information asymmetry in CDGZSL. The effectiveness of MDASR is demonstrated on the Office-Home and Mini-DomainNet, and we have shared the LLM-based semantics for these datasets as the benchmark.

Read more

5/24/2024

👁️

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

YC

0

Reddit

0

Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor generalizations (textit{e.g.}, overfitting to seen classes). To address this issue, we propose a novel Visual-Augmented Dynamic Semantic prototype method (termed VADS) to boost the generator to learn accurate semantic-visual mapping by fully exploiting the visual-augmented knowledge into semantic conditions. In detail, VADS consists of two modules: (1) Visual-aware Domain Knowledge Learning module (VDKL) learns the local bias and global prior of the visual features (referred to as domain visual knowledge), which replace pure Gaussian noise to provide richer prior noise information; (2) Vision-Oriented Semantic Updation module (VOSU) updates the semantic prototype according to the visual representations of the samples. Ultimately, we concatenate their output as a dynamic semantic prototype, which serves as the condition of the generator. Extensive experiments demonstrate that our VADS achieves superior CZSL and GZSL performances on three prominent datasets and outperforms other state-of-the-art methods with averaging increases by 6.4%, 5.9% and 4.2% on SUN, CUB and AWA2, respectively.

Read more

4/24/2024

↗️

Evolutionary Generalized Zero-Shot Learning

Dubing Chen, Chenyi Jiang, Haofeng Zhang

YC

0

Reddit

0

Attribute-based Zero-Shot Learning (ZSL) has revolutionized the ability of models to recognize new classes not seen during training. However, with the advancement of large-scale models, the expectations have risen. Beyond merely achieving zero-shot generalization, there is a growing demand for universal models that can continually evolve in expert domains using unlabeled data. To address this, we introduce a scaled-down instantiation of this challenge: Evolutionary Generalized Zero-Shot Learning (EGZSL). This setting allows a low-performing zero-shot model to adapt to the test data stream and evolve online. We elaborate on three challenges of this special task, ie, catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuous evolution from a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets demonstrate that our model can learn from the test data stream while other baselines fail. Codes are available at url{https://github.com/cdb342/EGZSL}.

Read more

5/14/2024

🐍

Double Descent and Other Interpolation Phenomena in GANs

Lorenzo Luzi, Yehuda Dar, Richard Baraniuk

YC

0

Reddit

0

We study overparameterization in generative adversarial networks (GANs) that can interpolate the training data. We show that overparameterization can improve generalization performance and accelerate the training process. We study the generalization error as a function of latent space dimension and identify two main behaviors, depending on the learning setting. First, we show that overparameterized generative models that learn distributions by minimizing a metric or $f$-divergence do not exhibit double descent in generalization errors; specifically, all the interpolating solutions achieve the same generalization error. Second, we develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. Our pseudo-supervised setting exhibits double descent (and in some cases, triple descent) of generalization errors. We combine pseudo-supervision with overparameterization (i.e., overly large latent space dimension) to accelerate training while matching or even surpassing generalization performance without pseudo-supervision. While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.

Read more

5/2/2024