OTTER: Improving Zero-Shot Classification via Optimal Transport

Read original: arXiv:2404.08461 - Published 4/15/2024 by Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

OTTER: Improving Zero-Shot Classification via Optimal Transport

Overview

This paper introduces OTTER, a novel zero-shot classification approach that leverages optimal transport to improve performance.
Zero-shot classification aims to classify data into categories that have no labeled training samples.
OTTER addresses challenges in existing zero-shot methods by using optimal transport to align feature representations and class prototypes.

Plain English Explanation

In machine learning, zero-shot classification is a technique where a model is trained to classify data into categories that have no labeled training examples. This is a challenging problem because the model has to generalize from limited information.

The OTTER method proposed in this paper aims to improve zero-shot classification by using a mathematical technique called optimal transport. Optimal transport allows the model to better align the feature representations of the input data with the "prototypes" or representative examples of each class, even when no labeled training data is available for those classes.

By using optimal transport, OTTER is able to overcome some of the limitations of existing zero-shot classification approaches, which may struggle to make accurate predictions when there is a large gap between the training and test distributions. The paper demonstrates that OTTER can achieve strong performance on several zero-shot classification benchmarks.

Technical Explanation

The key innovation of the OTTER method is the use of optimal transport to align the feature representations of the input data with the class prototypes. Specifically, the authors formulate the zero-shot classification problem as an optimal transport problem, where the goal is to find the most "efficient" way to "move" the input features to match the class prototypes.

This optimal transport formulation allows OTTER to capture the semantic and geometric relationships between the input data and the class prototypes, even in the absence of labeled training data. The authors show that by optimizing this optimal transport objective, OTTER is able to outperform previous state-of-the-art zero-shot classification methods on several standard benchmarks.

The paper also explores the use of unbalanced optimal transport, which can handle cases where the input data and class prototypes have different statistical properties. This extension of the basic optimal transport formulation further boosts the performance of OTTER in challenging zero-shot scenarios.

Critical Analysis

The OTTER method represents a promising approach to zero-shot classification, but the paper does not address some potential limitations and areas for future research:

The paper focuses on standard zero-shot classification benchmarks, but it's unclear how well OTTER would perform in more realistic, long-tailed scenarios where the data distribution is highly skewed.
The optimal transport formulation used in OTTER relies on access to class prototypes, which may not be available in all real-world settings. Exploring ways to learn these prototypes directly from data could further expand the applicability of the method.
The computational complexity of the optimal transport optimization may limit the scalability of OTTER to large-scale problems. Investigating more efficient optimization techniques could help address this issue.

Despite these potential limitations, the OTTER method represents an interesting and promising direction for improving zero-shot classification performance, and the paper provides a solid technical foundation for future work in this area.

Conclusion

The OTTER method introduces a novel approach to zero-shot classification that leverages optimal transport to better align input features with class prototypes. By formulating the zero-shot problem as an optimal transport problem, OTTER is able to capture the semantic and geometric relationships between the data and classes, leading to improved classification performance.

The use of unbalanced optimal transport further enhances OTTER's ability to handle challenging zero-shot scenarios. While the paper does not address all potential limitations of the method, it represents an important step forward in the field of zero-shot learning and opens up new avenues for future research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

OTTER: Improving Zero-Shot Classification via Optimal Transport

Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like Prior Matching -- often by significant margins -- in 17 out of 21 datasets.

4/15/2024

🌿

OTMatch: Improving Semi-Supervised Learning with Optimal Transport

Zhiquan Tan, Kaipeng Zheng, Weiran Huang

Semi-supervised learning has made remarkable strides by effectively utilizing a limited amount of labeled data while capitalizing on the abundant information present in unlabeled data. However, current algorithms often prioritize aligning image predictions with specific classes generated through self-training techniques, thereby neglecting the inherent relationships that exist within these classes. In this paper, we present a new approach called OTMatch, which leverages semantic relationships among classes by employing an optimal transport loss function to match distributions. We conduct experiments on many standard vision and language datasets. The empirical results show improvements in our method above baseline, this demonstrates the effectiveness and superiority of our approach in harnessing semantic relationships to enhance learning performance in a semi-supervised setting.

5/31/2024

SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

Chuyu Zhang, Hui Ren, Xuming He

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we propose a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To address this challenge, we introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport (SP$^2$OT) problem, which progressively transports each sample to imbalanced clusters under several prior distribution and semantic relation constraints, thus generating high-quality and imbalance-aware pseudo-labels. To solve SP$^2$OT, we develop a Majorization-Minimization-based optimization algorithm. To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method.

4/5/2024

Online Zero-Shot Classification with CLIP

Qi Qian, Juhua Hu

Vision-language pre-training such as CLIP enables zero-shot transfer that can classify images according to the candidate class names. While CLIP demonstrates an impressive zero-shot performance on diverse downstream tasks, the distribution from the target data has not been leveraged sufficiently. In this work, we study a novel online zero-shot transfer scenario, where each image arrives in a random order for classification and is visited only once to obtain prediction immediately without storing its representation. Compared with the vanilla zero-shot classification, the proposed framework preserves its flexibility for online service while considering the statistics of the arrived images as the side information to capture the distribution of target data, which can help improve the performance of real-world applications. To tackle the challenge of effective online optimization, we first develop online label learning to model the target data distribution. Then, the proxy of each class in the vision space is further optimized with the proposed online proxy learning method to mitigate the modality gap between images and text. The convergence of both online strategies can be theoretically guaranteed. By combining the predicted label from the online label learning and proxy learning, our online zero-shot transfer method (OnZeta) achieves $78.94%$ accuracy on ImageNet without accessing the entire data set. Moreover, extensive experiments on other 13 downstream tasks with different vision encoders show a more than $3%$ improvement on average, which demonstrates the effectiveness of our proposal. Code is available at url{https://github.com/idstcv/OnZeta}.

8/27/2024