Adjusting Logit in Gaussian Form for Long-Tailed Visual Recognition

Read original: arXiv:2305.10648 - Published 7/19/2024 by Mengke Li, Yiu-ming Cheung, Yang Lu, Zhikai Hu, Weichao Lan, Hui Huang

👁️

Overview

Real-world data often have a long-tailed distribution, making it challenging for deep neural networks to correctly classify the tail classes.
Existing methods have tried to address this problem by reducing classifier bias, but the authors found that the training process leads to an uneven embedding space, where the head classes are severely compressed compared to the tail classes.
This paper explores the long-tailed visual recognition problem from the perspective of feature-level representation, introducing a feature augmentation technique to balance the embedding distribution.

Plain English Explanation

In the real world, data often follows a long-tailed distribution, meaning that while there are a few very common data points, there are many rare or unusual data points. This can be a problem for deep neural networks, which struggle to correctly classify the rare or "tail" classes.

Previous research has tried to solve this problem by adjusting the neural network's decision-making process to be less biased towards the common "head" classes. However, the authors of this paper found that the way the neural network represents the data in its internal "embedding space" is also a problem. The embedding space for the head classes becomes severely compressed compared to the tail classes, making it harder for the network to learn to classify them properly.

To address this issue, the authors introduce a technique called "feature augmentation." This involves purposefully distorting or perturbing the feature representations of the different classes, so that the embedding space becomes more balanced. They also propose two new methods to further improve the model's performance on the tail classes.

By creating a more balanced embedding space, the authors show that the neural network can then be trained using simple class-balanced sampling to eliminate the bias towards head classes. This leads to better performance on long-tailed datasets compared to state-of-the-art methods.

Technical Explanation

The authors introduce a feature augmentation technique to address the long-tailed visual recognition problem. They observe that existing methods focus on reducing classifier bias, but training directly on long-tailed data leads to an uneven embedding space, where the head classes are severely compressed compared to the tail classes.

To balance the embedding distribution, the authors perturb the features of different classes with varying amplitudes in Gaussian form. Based on these perturbed features, they propose two novel "logit adjustment" methods to further improve model performance with modest computational overhead.

This calibrates the distorted embedding spaces of all classes, allowing a biased classifier to be eliminated by simply retraining it with class-balanced sampling data. Extensive experiments on benchmark datasets demonstrate the superiority of this approach over state-of-the-art methods, such as those found here, here, and here.

Critical Analysis

The authors acknowledge that their feature augmentation approach introduces a modest computational overhead compared to other methods. They also note that the proposed logit adjustment techniques are heuristic in nature and may not be optimal.

Additionally, the paper does not explore the potential trade-offs between improving tail class performance and maintaining head class performance. It would be valuable to understand the limits of the feature augmentation approach and under what conditions it may lead to unintended consequences.

Further research could investigate more principled ways of learning a balanced embedding space, perhaps by incorporating techniques like curvature-aware feature learning or contrastive learning with rebalanced objectives. Exploring the applicability of this approach to other domains, such as long-tailed text classification or long-tail image generation, could also yield valuable insights.

Conclusion

This paper presents a novel feature augmentation technique to address the long-tailed visual recognition problem. By balancing the embedding distribution, the authors show that a simple class-balanced retraining of the classifier can eliminate the bias towards head classes, leading to improved performance on tail classes.

The technique introduces a modest computational overhead but demonstrates superior results compared to state-of-the-art methods. Further research is needed to explore more principled ways of learning balanced feature representations and investigate the broader applicability of this approach to other domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

Adjusting Logit in Gaussian Form for Long-Tailed Visual Recognition

Mengke Li, Yiu-ming Cheung, Yang Lu, Zhikai Hu, Weichao Lan, Hui Huang

It is not uncommon that real-world data are distributed with a long tail. For such data, the learning of deep neural networks becomes challenging because it is hard to classify tail classes correctly. In the literature, several existing methods have addressed this problem by reducing classifier bias, provided that the features obtained with long-tailed data are representative enough. However, we find that training directly on long-tailed data leads to uneven embedding space. That is, the embedding space of head classes severely compresses that of tail classes, which is not conducive to subsequent classifier learning. This paper therefore studies the problem of long-tailed visual recognition from the perspective of feature level. We introduce feature augmentation to balance the embedding distribution. The features of different classes are perturbed with varying amplitudes in Gaussian form. Based on these perturbed features, two novel logit adjustment methods are proposed to improve model performance at a modest computational overhead. Subsequently, the distorted embedding spaces of all classes can be calibrated. In such balanced-distributed embedding spaces, the biased classifier can be eliminated by simply retraining the classifier with class-balanced sampling data. Extensive experiments conducted on benchmark datasets demonstrate the superior performance of the proposed method over the state-of-the-art ones. Source code is available at https://github.com/Keke921/GCLLoss.

7/19/2024

Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification

Yanbiao Ma, Licheng Jiao, Fang Liu, Shuyuan Yang, Xu Liu, Puhua Chen

Real-world data are long-tailed, the lack of tail samples leads to a significant limitation in the generalization ability of the model. Although numerous approaches of class re-balancing perform well for moderate class imbalance problems, additional knowledge needs to be introduced to help the tail class recover the underlying true distribution when the observed distribution from a few tail samples does not represent its true distribution properly, thus allowing the model to learn valuable information outside the observed domain. In this work, we propose to leverage the geometric information of the feature distribution of the well-represented head class to guide the model to learn the underlying distribution of the tail class. Specifically, we first systematically define the geometry of the feature distribution and the similarity measures between the geometries, and discover four phenomena regarding the relationship between the geometries of different feature distributions. Then, based on four phenomena, feature uncertainty representation is proposed to perturb the tail features by utilizing the geometry of the head class feature distribution. It aims to make the perturbed features cover the underlying distribution of the tail class as much as possible, thus improving the model's generalization performance in the test domain. Finally, we design a three-stage training scheme enabling feature uncertainty modeling to be successfully applied. Experiments on CIFAR-10/100-LT, ImageNet-LT, and iNaturalist2018 show that our proposed approach outperforms other similar methods on most metrics. In addition, the experimental phenomena we discovered are able to provide new perspectives and theoretical foundations for subsequent studies.

9/4/2024

A Systematic Review on Long-Tailed Learning

Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, Jo~ao Gama

Long-tailed data is a special type of multi-class imbalanced data with a very large amount of minority/tail classes that have a very significant combined influence. Long-tailed learning aims to build high-performance models on datasets with long-tailed distributions, which can identify all the classes with high accuracy, in particular the minority/tail classes. It is a cutting-edge research direction that has attracted a remarkable amount of research effort in the past few years. In this paper, we present a comprehensive survey of latest advances in long-tailed visual learning. We first propose a new taxonomy for long-tailed learning, which consists of eight different dimensions, including data balancing, neural architecture, feature enrichment, logits adjustment, loss function, bells and whistles, network optimization, and post hoc processing techniques. Based on our proposed taxonomy, we present a systematic review of long-tailed learning methods, discussing their commonalities and alignable differences. We also analyze the differences between imbalance learning and long-tailed learning approaches. Finally, we discuss prospects and future directions in this field.

8/2/2024

Text-Guided Mixup Towards Long-Tailed Image Categorization

Richard Franklin, Jiawei Yao, Deyang Zhong, Qi Qian, Juhua Hu

In many real-world applications, the frequency distribution of class labels for training data can exhibit a long-tailed distribution, which challenges traditional approaches of training deep neural networks that require heavy amounts of balanced data. Gathering and labeling data to balance out the class label distribution can be both costly and time-consuming. Many existing solutions that enable ensemble learning, re-balancing strategies, or fine-tuning applied to deep neural networks are limited by the inert problem of few class samples across a subset of classes. Recently, vision-language models like CLIP have been observed as effective solutions to zero-shot or few-shot learning by grasping a similarity between vision and language features for image and text pairs. Considering that large pre-trained vision-language models may contain valuable side textual information for minor classes, we propose to leverage text supervision to tackle the challenge of long-tailed learning. Concretely, we propose a novel text-guided mixup technique that takes advantage of the semantic relations between classes recognized by the pre-trained text encoder to help alleviate the long-tailed problem. Our empirical study on benchmark long-tailed tasks demonstrates the effectiveness of our proposal with a theoretical guarantee. Our code is available at https://github.com/rsamf/text-guided-mixup.

9/6/2024