Uncertainty-aware self-training with expectation maximization basis transformation

Read original: arXiv:2405.01175 - Published 5/3/2024 by Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia

Uncertainty-aware self-training with expectation maximization basis transformation

Overview

This paper presents a novel self-training approach called "Uncertainty-aware self-training with expectation maximization basis transformation" that aims to improve the performance of machine learning models in the presence of limited labeled data.
The key ideas involve using uncertainty estimates to guide the self-training process and leveraging an expectation maximization (EM) basis transformation to effectively incorporate unlabeled data.
The proposed method is evaluated on several benchmark datasets and shows promising results compared to other self-training techniques.

Plain English Explanation

Machine learning models often require large amounts of labeled data to achieve high performance. However, obtaining labeled data can be time-consuming and expensive. Self-training is a technique that can help address this problem by allowing models to learn from both labeled and unlabeled data.

In this paper, the researchers introduce a new self-training approach that takes into account the uncertainty of the model's predictions. The basic idea is to only use the unlabeled data points that the model is most confident about, rather than blindly using all the unlabeled data. This helps the model avoid incorporating noisy or unreliable information during the self-training process.

To further improve the effectiveness of self-training, the researchers also propose using an expectation maximization (EM) basis transformation. This technique helps the model better leverage the information contained in the unlabeled data by projecting it onto a more informative subspace.

By combining these two key ideas - uncertainty awareness and EM basis transformation - the researchers develop a self-training method that outperforms other state-of-the-art techniques on several benchmark datasets. This suggests that their approach is a promising direction for improving the performance of machine learning models when labeled data is scarce.

Technical Explanation

The core of the proposed method is a self-training framework that leverages both labeled and unlabeled data. The key innovations are:

Uncertainty-aware sample selection: Instead of using all unlabeled data points, the method selects only the most confident predictions made by the model. This is done by estimating the uncertainty of the model's predictions and only incorporating the data points with the lowest uncertainty.
Expectation maximization (EM) basis transformation: The researchers use an EM-based technique to project the unlabeled data onto a more informative subspace. This helps the model better capture the underlying structure of the data and more effectively learn from the unlabeled samples.

The self-training process alternates between two steps:

Model training: The model is trained on the labeled data and the most confident unlabeled data points selected based on the uncertainty estimates.
EM basis transformation: The unlabeled data is transformed using the EM basis, and the transformed data is used to update the model in the next iteration of training.

The researchers evaluate their method on several benchmark datasets, including image classification and text classification tasks. The results show that their uncertainty-aware, EM-based self-training approach outperforms other state-of-the-art self-training techniques, particularly when the amount of labeled data is limited.

Critical Analysis

The proposed method appears to be a well-designed and well-executed approach to improving self-training for machine learning models. The key ideas of uncertainty-aware sample selection and EM basis transformation seem well-motivated and the experimental results are promising.

One potential limitation of the work is that the EM basis transformation step may be computationally expensive, especially for large-scale datasets. The researchers do not provide much information on the scalability of their approach or the computational resources required. This could be an important consideration for real-world applications.

Additionally, the paper does not delve into the theoretical underpinnings of the EM basis transformation or provide a deeper analysis of why it is effective for self-training. A more comprehensive discussion of the underlying principles and assumptions could help strengthen the contribution.

It would also be interesting to see the method evaluated on a wider range of tasks and datasets to better understand its general applicability and robustness. Comparing the approach to other techniques that aim to leverage ensemble diversity or improve interpretability could also provide additional insights.

Conclusion

This paper presents a novel self-training approach that combines uncertainty-aware sample selection and expectation maximization basis transformation to effectively leverage both labeled and unlabeled data. The experimental results demonstrate the method's ability to outperform other state-of-the-art self-training techniques, particularly in low-data regimes.

The key innovations of this work, namely the use of uncertainty estimates and the EM basis transformation, offer a promising direction for improving the performance of machine learning models when labeled data is scarce. While the paper raises a few potential concerns, the overall contribution is a valuable addition to the growing body of research on self-training and semi-supervised learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Uncertainty-aware self-training with expectation maximization basis transformation

Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia

Self-training is a powerful approach to deep learning. The key process is to find a pseudo-label for modeling. However, previous self-training algorithms suffer from the over-confidence issue brought by the hard labels, even some confidence-related regularizers cannot comprehensively catch the uncertainty. Therefore, we propose a new self-training framework to combine uncertainty information of both model and dataset. Specifically, we propose to use Expectation-Maximization (EM) to smooth the labels and comprehensively estimate the uncertainty information. We further design a basis extraction network to estimate the initial basis from the dataset. The obtained basis with uncertainty can be filtered based on uncertainty information. It can then be transformed into the real hard label to iteratively update the model and basis in the retraining process. Experiments on image classification and semantic segmentation show the advantages of our methods among confidence-aware self-training algorithms with 1-3 percentage improvement on different datasets.

5/3/2024

Awareness of uncertainty in classification using a multivariate model and multi-views

Alexey Kornaev, Elena Kornaeva, Oleg Ivanov, Ilya Pershin, Danis Alukaev

One of the ways to make artificial intelligence more natural is to give it some room for doubt. Two main questions should be resolved in that way. First, how to train a model to estimate uncertainties of its own predictions? And then, what to do with the uncertain predictions if they appear? First, we proposed an uncertainty-aware negative log-likelihood loss for the case of N-dimensional multivariate normal distribution with spherical variance matrix to the solution of N-classes classification tasks. The loss is similar to the heteroscedastic regression loss. The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations. The model fits well with the label smoothing technique. Second, we expanded the limits of data augmentation at the training and test stages, and made the trained model to give multiple predictions for a given number of augmented versions of each test sample. Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions, including mode values and bin counts with soft and hard weights. For the latter method, we formalized the model tuning task in the form of multimodal optimization with non-differentiable criteria of maximum accuracy, and applied particle swarm optimization to solve the tuning task. The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels and demonstrated good results in comparison with other uncertainty estimation methods related to sample selection, co-teaching, and label smoothing.

4/17/2024

🧠

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

Konstantinos Pitas, Julyan Arbel

We propose a simple and effective method to estimate the uncertainty of closed-source deep neural network image classification models. Given a base image, our method creates multiple transformed versions and uses them to query the top-1 prediction of the closed-source model. We demonstrate significant improvements in the calibration of uncertainty estimates compared to the naive baseline of assigning 100% confidence to all predictions. While we initially explore Gaussian perturbations, our empirical findings indicate that natural transformations, such as rotations and elastic deformations, yield even better-calibrated predictions. Furthermore, through empirical results and a straightforward theoretical analysis, we elucidate the reasons behind the superior performance of natural transformations over Gaussian noise. Leveraging these insights, we propose a transfer learning approach that further improves our calibration results.

5/24/2024

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.

4/19/2024