Pseudo-Bayesian Optimization

2310.09766

Published 6/21/2024 by Haoxian Chen, Henry Lam

🛠️

Abstract

Bayesian Optimization is a popular approach for optimizing expensive black-box functions. Its key idea is to use a surrogate model to approximate the objective and, importantly, quantify the associated uncertainty that allows a sequential search of query points that balance exploitation-exploration. Gaussian process (GP) has been a primary candidate for the surrogate model, thanks to its Bayesian-principled uncertainty quantification power and modeling flexibility. However, its challenges have also spurred an array of alternatives whose convergence properties could be more opaque. Motivated by these, we study in this paper an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence that could apply beyond GP-based methods. Moreover, we leverage the design freedom in our framework, which we call Pseudo-Bayesian Optimization, to construct empirically superior algorithms. In particular, we show how using simple local regression, and a suitable randomized prior construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks in examples ranging from high-dimensional synthetic experiments to realistic hyperparameter tuning and robotic applications.

Create account to get full access

Overview

Bayesian Optimization is a popular approach for optimizing expensive black-box functions
The key idea is to use a surrogate model to approximate the objective and quantify the associated uncertainty
Gaussian processes have been a primary candidate for the surrogate model, but alternative methods have emerged
This paper presents an axiomatic framework to study the minimal requirements for guaranteed convergence in black-box optimization, beyond Gaussian processes

Plain English Explanation

Bayesian Optimization is a technique used to optimize expensive or complex functions that don't have a simple mathematical formula. The basic idea is to build a simpler "surrogate" model that approximates the real function, and then use that model to guide the search for the optimal input. Importantly, the surrogate model also provides an estimate of the uncertainty associated with its predictions.

This uncertainty information allows the optimization process to balance exploitation (searching near the current best guess) and exploration (searching in new areas to find better solutions). Gaussian processes have been a popular choice for the surrogate model, thanks to their ability to quantify uncertainty in a principled Bayesian way. However, other alternatives have emerged, and their convergence properties may not always be clear.

To address this, the researchers in this paper developed a general framework called Pseudo-Bayesian Optimization that identifies the minimal requirements for guaranteed convergence in black-box optimization, beyond just Gaussian processes. They then leveraged the flexibility of this framework to construct new algorithms that outperform state-of-the-art methods on a variety of tasks, from high-dimensional synthetic experiments to hyperparameter tuning and robotic applications.

Technical Explanation

The paper presents a theoretical framework for Pseudo-Bayesian Optimization, which generalizes the standard Bayesian Optimization approach. The key idea is to identify the minimal requirements for the surrogate model and acquisition function to guarantee convergence in black-box optimization, without relying on the specific properties of Gaussian processes.

The authors show that by using a simple local regression model and a suitable randomized prior construction to quantify uncertainty, they can ensure convergence while also outperforming state-of-the-art Bayesian Optimization methods across a range of benchmarks. This includes high-dimensional synthetic functions, hyperparameter tuning for machine learning models, and robotic control tasks.

The theoretical analysis reveals that the core requirements are: 1) the surrogate model should provide a valid uncertainty quantification, and 2) the acquisition function should balance exploration and exploitation in a principled way. The authors demonstrate how these conditions can be satisfied using their Pseudo-Bayesian Optimization framework, which is more flexible than the standard Gaussian process approach.

Critical Analysis

The paper presents a compelling theoretical framework and empirical results for a generalized Bayesian Optimization approach. The authors carefully define the minimal requirements for guaranteed convergence, going beyond the specific assumptions of Gaussian processes.

One potential limitation is that the theoretical analysis assumes the existence of a "global minimizer" of the black-box function, which may not always be the case in practice. Additionally, the paper does not deeply explore the computational efficiency of the proposed algorithms compared to standard Bayesian Optimization methods.

Further research could investigate the performance of Pseudo-Bayesian Optimization on a wider variety of real-world optimization problems, including those with non-convex or multi-modal objective functions. It would also be valuable to understand how the framework might be extended to handle constraints, multi-objective optimization, or other practical considerations.

Overall, the paper makes a valuable contribution by providing a more general theoretical foundation for black-box optimization, and demonstrating the potential advantages of the Pseudo-Bayesian Optimization approach.

Conclusion

This paper presents an axiomatic framework called Pseudo-Bayesian Optimization that generalizes the standard Bayesian Optimization approach. The key innovation is to identify the minimal requirements for the surrogate model and acquisition function to guarantee convergence in black-box optimization, without relying on the specific properties of Gaussian processes.

The authors show that by using a simple local regression model and a suitable randomized prior construction, they can not only ensure convergence but also outperform state-of-the-art Bayesian Optimization methods across a range of benchmarks. This includes high-dimensional synthetic functions, hyperparameter tuning for machine learning models, and robotic control tasks.

The broader significance of this work is that it provides a more flexible and robust foundation for black-box optimization, with potential applications in fields like machine learning, engineering, and scientific research, where complex and expensive-to-evaluate functions are commonly encountered.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

Huong Ha, Vu Nguyen, Hung Tran-The, Hongyu Zhang, Xiuzhen Zhang, Anton van den Hengel

Gaussian process (GP) based Bayesian optimization (BO) is a powerful method for optimizing black-box functions efficiently. The practical performance and theoretical guarantees of this approach depend on having the correct GP hyperparameter values, which are usually unknown in advance and need to be estimated from the observed data. However, in practice, these estimations could be incorrect due to biased data sampling strategies used in BO. This can lead to degraded performance and break the sub-linear global convergence guarantee of BO. To address this issue, we propose a new BO method that can sub-linearly converge to the objective function's global optimum even when the true GP hyperparameters are unknown in advance and need to be estimated from the observed data. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures consistent estimation. We further provide theoretical analysis of our proposed method. Finally, we demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.

6/7/2024

stat.ML cs.LG

🛠️

Principled Preferential Bayesian Optimization

Wenjie Xu, Wenbin Wang, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones

We study the problem of preferential Bayesian optimization (BO), where we aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. Inspired by the likelihood ratio idea, we construct a confidence set of the black-box function using only the preference feedback. An optimistic algorithm with an efficient computational method is then developed to solve the problem, which enjoys an information-theoretic bound on the total cumulative regret, a first-of-its-kind for preferential BO. This bound further allows us to design a scheme to report an estimated best solution, with a guaranteed convergence rate. Experimental results on sampled instances from Gaussian processes, standard test functions, and a thermal comfort optimization problem all show that our method stably achieves better or competitive performance as compared to the existing state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret bounds or convergence.

5/30/2024

cs.LG

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu

This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model's loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.

5/30/2024

cs.LG cs.AI cs.CR cs.CV stat.ML

🧠

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson

Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions.

5/9/2024

cs.LG stat.ML