Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

2404.05445

Published 4/9/2024 by Hong Ye Tan, Ziruo Cai, Marcelo Pereyra, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schonlieb

Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

Abstract

Unsupervised learning is a training approach in the situation where ground truth data is unavailable, such as inverse imaging problems. We present an unsupervised Bayesian training approach to learning convex neural network regularizers using a fixed noisy dataset, based on a dual Markov chain estimation method. Compared to classical supervised adversarial regularization methods, where there is access to both clean images as well as unlimited to noisy copies, we demonstrate close performance on natural image Gaussian deconvolution and Poisson denoising tasks.

Create account to get full access

Overview

This paper presents a method for unsupervised training of convex regularizers using maximum likelihood estimation.
The authors propose a novel approach to learn flexible and interpretable regularizers directly from data, without relying on hand-crafted penalty functions.
The method can be applied to a wide range of machine learning tasks, including regression, classification, and structured prediction.

Plain English Explanation

Regularization is an important technique in machine learning that helps prevent overfitting by adding a penalty term to the objective function. Traditionally, researchers have relied on hand-crafted penalty functions, such as L1 or L2 regularization, which may not always be the best fit for the problem at hand.

This paper introduces a new way to learn the regularizer directly from the data, in an unsupervised manner. The key idea is to model the distribution of the target variable (e.g., the output of a regression model) using a flexible probability distribution, and then use maximum likelihood estimation to find the regularizer that best fits this distribution.

The authors show that this approach can learn interpretable and effective regularizers for a variety of machine learning tasks, including link to "Multi-Task Learning via Robust Regularized Clustering", link to "ReCoRe: Regularized Contrastive Representation Learning for World Models", and link to "Adaptive Gradient-Enhanced Gaussian Process Surrogates for Inverse Problems". The learned regularizers can capture complex patterns in the data and lead to improved performance compared to traditional regularization methods.

Technical Explanation

The key idea of the proposed method is to model the distribution of the target variable using a flexible probability distribution, and then use maximum likelihood estimation to find the regularizer that best fits this distribution.

Specifically, the authors assume that the target variable follows a convex, differentiable, and proper scoring rule-based probability distribution, which can be parameterized by a set of convex regularizers. They then use an optimization procedure to find the regularizers that maximize the likelihood of the observed data.

The authors demonstrate the effectiveness of their approach on a range of machine learning tasks, including link to "Positive-Unlabeled Contrastive Learning" and link to "Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression". The results show that the learned regularizers can outperform traditional hand-crafted penalty functions and lead to improved predictive performance.

Critical Analysis

The authors acknowledge several limitations of their approach, including the assumption of a convex, differentiable, and proper scoring rule-based probability distribution for the target variable. This assumption may not always hold in practice, and the authors suggest exploring more flexible probability distributions as a potential avenue for future research.

Additionally, the authors note that the computational cost of the proposed method can be high, especially for large-scale problems. They suggest exploring ways to accelerate the optimization procedure, such as through the use of efficient numerical optimization techniques or approximations.

Overall, the paper presents a novel and promising approach for unsupervised learning of regularizers, with potential applications across a wide range of machine learning tasks. However, further research is needed to address the limitations and explore the full potential of this method.

Conclusion

This paper introduces a novel approach for unsupervised training of convex regularizers using maximum likelihood estimation. The key idea is to model the distribution of the target variable using a flexible probability distribution and then learn the regularizers that best fit this distribution.

The authors demonstrate the effectiveness of their approach on a range of machine learning tasks, showing that the learned regularizers can outperform traditional hand-crafted penalty functions and lead to improved predictive performance. While the method has some limitations, it represents an important step towards more flexible and interpretable regularization techniques in machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🏋️

Quantile-based Maximum Likelihood Training for Outlier Detection

Masoud Taghikhah, Nishant Kumar, Siniv{s}a v{S}egvi'c, Abouzar Eslami, Stefan Gumhold

Discriminative learning effectively predicts true object class for image classification. However, it often results in false positives for outliers, posing critical concerns in applications like autonomous driving and video surveillance systems. Previous attempts to address this challenge involved training image classifiers through contrastive learning using actual outlier data or synthesizing outliers for self-supervised learning. Furthermore, unsupervised generative modeling of inliers in pixel space has shown limited success for outlier detection. In this work, we introduce a quantile-based maximum likelihood objective for learning the inlier distribution to improve the outlier separation during inference. Our approach fits a normalizing flow to pre-trained discriminative features and detects the outliers according to the evaluated log-likelihood. The experimental evaluation demonstrates the effectiveness of our method as it surpasses the performance of the state-of-the-art unsupervised methods for outlier detection. The results are also competitive compared with a recent self-supervised approach for outlier detection. Our work allows to reduce dependency on well-sampled negative training data, which is especially important for domains like medical diagnostics or remote sensing.

6/4/2024

cs.CV cs.LG

🐍

Calibration-Aware Bayesian Learning

Jiayi Huang, Sangwoo Park, Osvaldo Simeone

Deep learning models, including modern systems like large language models, are well known to offer unreliable estimates of the uncertainty of their decisions. In order to improve the quality of the confidence levels, also known as calibration, of a model, common approaches entail the addition of either data-dependent or data-independent regularization terms to the training loss. Data-dependent regularizers have been recently introduced in the context of conventional frequentist learning to penalize deviations between confidence and accuracy. In contrast, data-independent regularizers are at the core of Bayesian learning, enforcing adherence of the variational distribution in the model parameter space to a prior density. The former approach is unable to quantify epistemic uncertainty, while the latter is severely affected by model misspecification. In light of the limitations of both methods, this paper proposes an integrated framework, referred to as calibration-aware Bayesian neural networks (CA-BNNs), that applies both regularizers while optimizing over a variational distribution as in Bayesian learning. Numerical results validate the advantages of the proposed approach in terms of expected calibration error (ECE) and reliability diagrams.

4/15/2024

cs.LG eess.SP

Learned Regularization for Inverse Problems: Insights from a Spectral Model

Martin Burger, Samira Kabri

In this chapter we provide a theoretically founded investigation of state-of-the-art learning approaches for inverse problems from the point of view of spectral reconstruction operators. We give an extended definition of regularization methods and their convergence in terms of the underlying data distributions, which paves the way for future theoretical studies. Based on a simple spectral learning model previously introduced for supervised learning, we investigate some key properties of different learning paradigms for inverse problems, which can be formulated independently of specific architectures. In particular we investigate the regularization properties, bias, and critical dependence on training data distributions. Moreover, our framework allows to highlight and compare the specific behavior of the different paradigms in the infinite-dimensional limit.

6/5/2024

cs.LG cs.NA

Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation

Zakhar Shumaylov, Jeremy Budd, Subhadip Mukherjee, Carola-Bibiane Schonlieb

Variational regularisation is the primary method for solving inverse problems, and recently there has been considerable work leveraging deeply learned regularisation for enhanced performance. However, few results exist addressing the convergence of such regularisation, particularly within the context of critical points as opposed to global minimisers. In this paper, we present a generalised formulation of convergent regularisation in terms of critical points, and show that this is achieved by a class of weakly convex regularisers. We prove convergence of the primal-dual hybrid gradient method for the associated variational problem, and, given a Kurdyka-Lojasiewicz condition, an $mathcal{O}(log{k}/k)$ ergodic convergence rate. Finally, applying this theory to learned regularisation, we prove universal approximation for input weakly convex neural networks (IWCNN), and show empirically that IWCNNs can lead to improved performance of learned adversarial regularisers for computed tomography (CT) reconstruction.

6/18/2024

cs.CV cs.LG stat.ML