Adaptive Robust Learning using Latent Bernoulli Variables

2312.00585

Published 6/17/2024 by Aleksandr Karakulev (Uppsala University, Sweden), Dave Zachariah (Uppsala University, Sweden), Prashant Singh (Uppsala University, Sweden, Science for Life Laboratory, Sweden)

stat.ML cs.LG

Adaptive Robust Learning using Latent Bernoulli Variables

Abstract

We present an adaptive approach for robust learning from corrupted training sets. We identify corrupted and non-corrupted samples with latent Bernoulli variables and thus formulate the learning problem as maximization of the likelihood where latent variables are marginalized. The resulting problem is solved via variational inference, using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level, while adding minimal computational overhead. We demonstrate our robust learning method and its parameter-free nature on a wide variety of machine learning tasks including online learning and deep learning where it adapts to different levels of noise and maintains high prediction accuracy.

Create account to get full access

Overview

This paper proposes a new adaptive and parameter-free robust learning algorithm based on latent Bernoulli variables.
The algorithm is designed to handle data with adversarial or heavy-tailed noise, without requiring manual tuning of hyperparameters.
The authors demonstrate the effectiveness of their approach on several real-world datasets, showing that it outperforms existing robust learning methods.

Plain English Explanation

In machine learning, there are often situations where the data we have is not perfect - it may contain noise, errors, or even intentional adversarial attacks that try to mislead the learning algorithm. This can make it challenging to train accurate models, as the noise and errors can skew the results.

The authors of this paper have developed a new machine learning technique that is designed to be more robust to these types of imperfections in the data. Their approach uses something called "latent Bernoulli variables" to help the algorithm adapt and learn effectively even in the presence of noisy or adversarial data.

Essentially, the algorithm is able to identify which parts of the data are likely to be problematic or unreliable, and then adjust its learning process accordingly. This means it can still learn a good model without being overly influenced by the noisy or corrupted parts of the data.

[This relates to research on improving the reliability of black-box variational inference and robust distribution learning under local and global adversarial corruptions.]

The authors tested their algorithm on several real-world datasets and found that it outperformed existing robust learning methods. This suggests that their approach could be very useful in a wide range of machine learning applications where the data quality is not perfect.

Technical Explanation

The key idea behind the authors' approach is to introduce latent Bernoulli variables that indicate whether each data point is "good" (reliable) or "bad" (corrupted by noise or adversarial attacks). These latent variables are then jointly optimized with the model parameters during the learning process.

[This relates to research on outlier-robust Kalman filtering through generalized Bayes and leveraging offline data for linear latent bandits.]

The authors formulate the problem as a robust optimization task, where the objective is to minimize the expected loss under the worst-case distribution of the latent Bernoulli variables. They show that this can be efficiently solved using a novel stochastic gradient descent algorithm that alternates between updating the model parameters and the latent variables.

The key advantage of this approach is that it is completely parameter-free - the algorithm automatically learns the optimal level of robustness from the data, without requiring any manual tuning of hyperparameters. This makes it more accessible and easier to use in practice compared to existing robust learning methods.

Critical Analysis

The authors provide a thorough experimental evaluation of their proposed algorithm, demonstrating its effectiveness on a range of real-world datasets. However, they do not explore the theoretical properties of the method in depth, such as its statistical convergence guarantees or its robustness to different types of noise or adversarial attacks.

[This relates to research on Bayesian approaches to online learning in contextual restless bandits.]

Additionally, the authors do not discuss potential limitations or failure modes of their approach, such as how it might perform in the presence of very high levels of noise or adversarial attacks, or how it scales to large-scale problems. Further research would be needed to fully understand the strengths and weaknesses of this method.

Conclusion

Overall, the authors have presented a promising new approach for robust machine learning that adapts to the data without requiring manual parameter tuning. This could have significant practical benefits in a wide range of applications where data quality is a concern. However, additional research is needed to fully characterize the method's theoretical properties and its performance under various challenging conditions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

Yibin Wang, Haizhou Shi, Ligong Han, Dimitris Metaxas, Hao Wang

Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.

6/19/2024

cs.LG cs.AI cs.CL stat.ML

🤯

A Framework for Improving the Reliability of Black-box Variational Inference

Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, Jonathan H. Huggins

Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-box VI (RABVI), a framework for improving the reliability of BBVI optimization. RABVI is based on rigorously justified automation techniques, includes just a small number of intuitive tuning parameters, and detects inaccurate estimates of the optimal variational approximation. RABVI adaptively decreases the learning rate by detecting convergence of the fixed--learning-rate iterates, then estimates the symmetrized Kullback--Leibler (KL) divergence between the current variational approximation and the optimal one. It also employs a novel optimization termination criterion that enables the user to balance desired accuracy against computational cost by comparing (i) the predicted relative decrease in the symmetrized KL divergence if a smaller learning were used and (ii) the predicted computation required to converge with the smaller learning rate. We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples.

5/17/2024

stat.ML cs.LG

Latent Variable Sequence Identification for Cognitive Models with Neural Bayes Estimation

Ti-Fen Pan, Jing-Jing Li, Bill Thompson, Anne Collins

Extracting time-varying latent variables from computational cognitive models is a key step in model-based neural analysis, which aims to understand the neural correlates of cognitive processes. However, existing methods only allow researchers to infer latent variables that explain subjects' behavior in a relatively small class of cognitive models. For example, a broad class of relevant cognitive models with analytically intractable likelihood is currently out of reach from standard techniques, based on Maximum a Posteriori parameter estimation. Here, we present an approach that extends neural Bayes estimation to learn a direct mapping between experimental data and the targeted latent variable space using recurrent neural networks and simulated datasets. We show that our approach achieves competitive performance in inferring latent variable sequences in both tractable and intractable models. Furthermore, the approach is generalizable across different computational models and is adaptable for both continuous and discrete latent spaces. We then demonstrate its applicability in real world datasets. Our work underscores that combining recurrent neural networks and simulation-based inference to identify latent variable sequences can enable researchers to access a wider class of cognitive models for model-based neural analyses, and thus test a broader set of theories.

6/24/2024

cs.LG stat.ML

🔍

Robust Distribution Learning with Local and Global Adversarial Corruptions

Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee

We consider learning in an adversarial environment, where an $varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (*global* corruptions) and the remaining perturbations have average magnitude bounded by $rho$ (*local* corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $hat{P}_n$ that minimizes the Wasserstein distance $mathsf{W}_1(hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $mathsf{W}_1(Pi_# hat{P}_n, Pi_# P)$ for all orthogonal projections $Pi in mathbb{R}^{d times d}$, with performance scaling with $mathrm{rank}(Pi) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $sqrt{varepsilon k} + rho + d^{O(1)}tilde{O}(n^{-1/k})$ when $P$ has bounded moments of order $2+delta$, for constant $delta > 0$. For data distributions with bounded covariance, our finite-sample bounds match the minimax population-level optimum for large sample sizes. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.

6/11/2024

cs.LG stat.ML