Scalable Subsampling Inference for Deep Neural Networks

2405.08276

Published 5/15/2024 by Kejin Wu, Dimitris N. Politis

🤯

Abstract

Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest results on the approximation ability of DNN. More importantly, however, a non-random subsampling technique--scalable subsampling--is applied to construct a `subagged' DNN estimator. Under regularity conditions, it is shown that the subagged DNN estimator is computationally efficient without sacrificing accuracy for either estimation or prediction tasks. Beyond point estimation/prediction, we propose different approaches to build confidence and prediction intervals based on the subagged DNN estimator. In addition to being asymptotically valid, the proposed confidence/prediction intervals appear to work well in finite samples. All in all, the scalable subsampling DNN estimator offers the complete package in terms of statistical inference, i.e., (a) computational efficiency; (b) point estimation/prediction accuracy; and (c) allowing for the construction of practically useful confidence and prediction intervals.

Create account to get full access

Overview

Deep neural networks (DNNs) have become increasingly popular in machine learning applications.
A recent paper has developed a non-asymptotic error bound for fully connected DNN estimators with ReLU activation functions in regression modeling.
The paper proposes a new "subagged" DNN estimator using a non-random subsampling technique called "scalable subsampling".
The subagged DNN estimator is computationally efficient without sacrificing accuracy for estimation or prediction tasks.
The paper also presents methods to construct valid confidence and prediction intervals based on the subagged DNN estimator.

Plain English Explanation

Deep neural networks (DNNs) are a powerful type of machine learning model that have become very popular in recent years. A team of researchers has developed a new way to measure how well a specific kind of DNN model performs when used to estimate regression models.

The researchers start by making a small improvement to the current best-known error bound for this type of DNN model. But more importantly, they introduce a new technique called "scalable subsampling" that allows them to create a more efficient version of the DNN estimator, called a "subagged" DNN estimator.

The subagged DNN estimator is computationally faster to train than the original DNN, but it doesn't sacrifice any accuracy when it comes to making estimates or predictions. Beyond just point estimates, the researchers also show how to use the subagged DNN to construct confidence intervals and prediction intervals, which provide a measure of uncertainty around the estimates and predictions.

Overall, this subagged DNN estimator offers the complete package - it's efficient to compute, maintains high accuracy, and enables meaningful statistical inferences to be drawn from the results. The innovations in this paper represent an important advance in the practical use of deep learning methods for real-world modeling and prediction tasks.

Technical Explanation

The paper proposes a new "subagged" deep neural network (DNN) estimator that builds on the latest research around the approximation ability of DNNs.

The authors start by deriving a non-asymptotic error bound for a fully connected DNN estimator with ReLU activation functions, improving slightly on the current state-of-the-art. However, the main contribution is the application of a non-random subsampling technique called "scalable subsampling" to construct the subagged DNN estimator.

Under certain regularity conditions, the subagged DNN estimator is shown to be computationally efficient without compromising the accuracy of either estimation or prediction tasks. This builds on prior work on robust deep learning from weakly dependent data and modeling sampling distributions of test statistics.

Beyond just point estimates, the paper also proposes methods to construct asymptotically valid confidence and prediction intervals based on the subagged DNN estimator. This allows for richer statistical inference, going beyond what is typically provided by standard deep learning approaches.

Critical Analysis

The paper makes a valuable contribution by providing a computationally efficient variant of DNN estimators that maintains high accuracy. The use of scalable subsampling to create the subagged estimator is an interesting and novel technique that builds on prior work in Bayesian neural network surrogates for optimization and probabilistic survival analysis.

However, the paper does not provide extensive empirical validation of the proposed methods. While the theoretical results are promising, more real-world experiments and comparisons to alternative approaches would help demonstrate the practical benefits of the subagged DNN estimator.

Additionally, the paper does not delve into the potential limitations or failure modes of the subagged estimator. For example, it's unclear how the method would perform in the presence of severe data shifts or other distribution changes that could impact the underlying DNN model.

Overall, this is a well-executed piece of research that makes a meaningful advance in the field of DNN-based statistical inference. The ideas presented here are likely to find application in a variety of machine learning domains where both predictive accuracy and quantifiable uncertainty are important.

Conclusion

This paper introduces a new "subagged" deep neural network (DNN) estimator that is computationally efficient while maintaining high accuracy for both estimation and prediction tasks. By applying a non-random subsampling technique called "scalable subsampling", the researchers were able to construct a DNN estimator that outperforms the standard DNN approach.

Beyond just point estimates, the paper also shows how to use the subagged DNN estimator to build valid confidence and prediction intervals, enabling richer statistical inference. This represents an important advance in the practical application of deep learning methods, where quantifying uncertainty is crucial for real-world decision making.

Overall, the subagged DNN estimator offers a compelling solution for machine learning practitioners who need to balance computational efficiency, predictive accuracy, and statistical rigor. While further empirical validation would be valuable, this work demonstrates the potential for innovative techniques to push the boundaries of what is possible with deep neural networks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Javier Antoran

Large neural networks trained on large datasets have become the dominant paradigm in machine learning. These systems rely on maximum likelihood point estimates of their parameters, precluding them from expressing model uncertainty. This may result in overconfident predictions and it prevents the use of deep learning models for sequential decision making. This thesis develops scalable methods to equip neural networks with model uncertainty. In particular, we leverage the linearised Laplace approximation to equip pre-trained neural networks with the uncertainty estimates provided by their tangent linear models. This turns the problem of Bayesian inference in neural networks into one of Bayesian inference in conjugate Gaussian-linear models. Alas, the cost of this remains cubic in either the number of network parameters or in the number of observations times output dimensions. By assumption, neither are tractable. We address this intractability by using stochastic gradient descent (SGD) -- the workhorse algorithm of deep learning -- to perform posterior sampling in linear models and their convex duals: Gaussian processes. With this, we turn back to linearised neural networks, finding the linearised Laplace approximation to present a number of incompatibilities with modern deep learning practices -- namely, stochastic optimisation, early stopping and normalisation layers -- when used for hyperparameter learning. We resolve these and construct a sample-based EM algorithm for scalable hyperparameter learning with linearised neural networks. We apply the above methods to perform linearised neural network inference with ResNet-50 (25M parameters) trained on Imagenet (1.2M observations and 1000 output dimensions). Additionally, we apply our methods to estimate uncertainty for 3d tomographic reconstructions obtained with the deep image prior network.

5/1/2024

stat.ML cs.LG

📈

Model Free Prediction with Uncertainty Assessment

Yuling Jiao, Lican Kang, Jin Liu, Heng Peng, Heng Zuo

Deep nonparametric regression, characterized by the utilization of deep neural networks to learn target functions, has emerged as a focus of research attention in recent years. Despite considerable progress in understanding convergence rates, the absence of asymptotic properties hinders rigorous statistical inference. To address this gap, we propose a novel framework that transforms the deep estimation paradigm into a platform conducive to conditional mean estimation, leveraging the conditional diffusion model. Theoretically, we develop an end-to-end convergence rate for the conditional diffusion model and establish the asymptotic normality of the generated samples. Consequently, we are equipped to construct confidence regions, facilitating robust statistical inference. Furthermore, through numerical experiments, we empirically validate the efficacy of our proposed methodology.

6/18/2024

stat.ML cs.LG

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Dongya Wu, Xin Li

Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the convergence of parameter estimation heavily relies on the regularity of the Hessian matrix, while the Hessian matrix of deep neural networks is highly singular. To avoid the unidentifiability of deep neural networks in parameter estimation, we propose to conduct nonparametric estimation of partial derivatives with respect to inputs. We first show that model convergence of sparse deep neural networks is guaranteed in that the sample complexity only grows with the logarithm of the number of parameters or the input dimension when the $ell_{1}$-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as $mathcal{O}(n^{-1/4})$, a rate which is slower than the model convergence rate $mathcal{O}(n^{-1/2})$. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.

6/27/2024

stat.ML cs.LG

🤯

ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference

Nikita Durasov, Nik Dorndorf, Hieu Le, Pascal Fua

Whereas the ability of deep networks to produce useful predictions has been amply demonstrated, estimating the reliability of these predictions remains challenging. Sampling approaches such as MC-Dropout and Deep Ensembles have emerged as the most popular ones for this purpose. Unfortunately, they require many forward passes at inference time, which slows them down. Sampling-free approaches can be faster but suffer from other drawbacks, such as lower reliability of uncertainty estimates, difficulty of use, and limited applicability to different types of tasks and data. In this work, we introduce a sampling-free approach that is generic and easy to deploy, while producing reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost. It is predicated on training the network to produce the same output with and without additional information about it. At inference time, when no prior information is given, we use the network's own prediction as the additional information. We then take the distance between the predictions with and without prior information as our uncertainty measure. We demonstrate our approach on several classification and regression tasks. We show that it delivers results on par with those of Ensembles but at a much lower computational cost.

5/28/2024

cs.LG cs.CV