Robust deep learning from weakly dependent data

2405.05081

Published 5/9/2024 by William Kengne, Modou Wade

🤿

Abstract

Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $psi$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of Holder smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.

Create account to get full access

Overview

This paper explores robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output.
It establishes non-asymptotic bounds for the expected excess risk of deep neural network estimators under strong mixing and ψ-weak dependence assumptions.
The paper investigates the relationship between these bounds and the finite order moment of the output variable, as well as the convergence rate when the data have moments of any order.
Applications to robust nonparametric regression and robust nonparametric autoregression are considered, and a simulation study shows that robust estimators outperform the least squares method in models with heavy-tailed errors.

Plain English Explanation

This paper looks at deep learning models that can handle messy, unpredictable data. Most existing deep learning research has focused on data with bounded values or predictable patterns. However, in the real world, we often encounter data that doesn't fit those neat assumptions - it might have extreme values, complex dependencies, or unpredictable behavior.

The researchers in this paper developed a deep learning approach that can work with this kind of "wild" data. They established mathematical bounds on how well the deep learning model can perform, even when the data has unbounded inputs and outputs, and when the relationship between the inputs and outputs is complex and unpredictable.

Importantly, the paper shows that as the data gets "messier" (i.e., has higher-order moments), the deep learning model can still perform well and converge to good predictions. The researchers also applied this approach to two specific problems: robust nonparametric regression and robust nonparametric autoregression.

In simulations, the deep learning models with robust loss functions (like absolute loss and Huber loss) outperformed the traditional least squares method when dealing with data that had heavy-tailed errors. This suggests the new deep learning approach could be very useful for real-world applications where the data is unpredictable and messy, like stock market prediction or medical diagnosis.

Technical Explanation

The paper establishes non-asymptotic bounds for the expected excess risk of deep neural network estimators under strong mixing and ψ-weak dependence assumptions on the observations. This means the researchers derived mathematical expressions that quantify how well the deep learning model will perform, even when the input data has complex dependencies and unpredictable patterns.

Importantly, these bounds are derived without assuming the data has bounded loss functions or Gaussian/bounded inputs. Instead, the only assumption is that the output variable has a finite rth order moment, where r > 1. This allows the model to handle "heavy-tailed" data with extreme values.

The paper explores the relationship between these bounds and the value of r, as well as the convergence rate when the data have moments of any order (i.e., r = ∞). When the target function belongs to a class of Hölder smooth functions with sufficiently large smoothness, the convergence rate for exponentially strongly mixing data is shown to be close to or the same as the rate for independent and identically distributed (i.i.d.) samples.

The researchers apply this robust deep learning approach to two specific problems: nonparametric regression and nonparametric autoregression. Simulation results demonstrate that the robust estimators with absolute loss and Huber loss functions outperform the traditional least squares method when dealing with heavy-tailed errors.

Critical Analysis

The paper makes important contributions by developing a deep learning framework that can handle messy, unpredictable data. This is a significant advancement over previous deep learning research that relied on restrictive assumptions about the data.

However, the paper does not explore the practical limitations of this approach. For example, it's unclear how the required assumptions about strong mixing and ψ-weak dependence would be verified in real-world scenarios. Additionally, the simulation study is limited in scope, and it would be helpful to see the approach tested on more diverse and realistic datasets.

Furthermore, the paper does not discuss the computational complexity of the proposed deep learning models, which could be a crucial factor in their practical applicability, especially for large-scale problems. Insights into the trade-offs between model complexity, training efficiency, and robustness would be valuable.

Overall, the theoretical contributions of this paper are significant, but more work is needed to understand the practical implications and limitations of the proposed robust deep learning approach.

Conclusion

This paper presents a novel deep learning framework that can handle messy, unpredictable data with unbounded loss functions and unbounded input/output. The researchers derive non-asymptotic bounds for the expected excess risk of deep neural network estimators under weak dependence assumptions, and they show that the convergence rate can be close to or the same as i.i.d. samples when the target function is sufficiently smooth.

The application of this robust deep learning approach to nonparametric regression and autoregression tasks, along with the simulation results showing improved performance over traditional least squares, suggests that this work could have important implications for real-world problems involving complex, heavy-tailed data, such as stock market prediction or medical diagnosis. Further research is needed to address the practical limitations and explore the trade-offs between model complexity, training efficiency, and robustness.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤿

Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality

William Kengne, Modou Wade

The explicit regularization and optimality of deep neural networks estimators from independent data have made considerable progress recently. The study of such properties on dependent data is still a challenge. In this paper, we carry out deep learning from strongly mixing observations, and deal with the squared and a broad class of loss functions. We consider sparse-penalized regularization for deep neural network predictor. For a general framework that includes, regression estimation, classification, time series prediction,$cdots$, oracle inequality for the expected excess risk is established and a bound on the class of Holder smooth functions is provided. For nonparametric regression from strong mixing data and sub-exponentially error, we provide an oracle inequality for the $L_2$ error and investigate an upper bound of this error on a class of Holder composition functions. For the specific case of nonparametric autoregression with Gaussian and Laplace errors, a lower bound of the $L_2$ error on this Holder composition class is established. Up to logarithmic factor, this bound matches its upper bound; so, the deep neural network estimator attains the minimax optimal rate.

6/13/2024

stat.ML cs.LG

📉

Learning with little mixing

Ingvar Ziemann, Stephen Tu

We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the $L^2$ and $L^{2+epsilon}$ norms are equivalent, ergodic finite state Markov chains, various parametric models, and a broad family of infinite dimensional $ell^2(mathbb{N})$ ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.

6/14/2024

cs.LG stat.ML

🤖

Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

Ingvar Ziemann, Stephen Tu, George J. Pappas, Nikolai Matni

In this work, we study statistical learning with dependent ($beta$-mixing) data and square loss in a hypothesis class $mathscr{F}subset L_{Psi_p}$ where $Psi_p$ is the norm $|f|_{Psi_p} triangleq sup_{mgeq 1} m^{-1/p} |f|_{L^m} $ for some $pin [2,infty]$. Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent data. Absent any realizability assumption, typical non-asymptotic results exhibit variance proxies that are deflated multiplicatively by the mixing time of the underlying covariates process. We show that whenever the topologies of $L^2$ and $Psi_p$ are comparable on our hypothesis class $mathscr{F}$ -- that is, $mathscr{F}$ is a weakly sub-Gaussian class: $|f|_{Psi_p} lesssim |f|_{L^2}^eta$ for some $etain (0,1]$ -- the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. Our result holds whether the problem is realizable or not and we refer to this as a emph{near mixing-free rate}, since direct dependence on mixing is relegated to an additive higher order term. We arrive at our result by combining the above notion of a weakly sub-Gaussian class with mixed tail generic chaining. This combination allows us to compute sharp, instance-optimal rates for a wide range of problems. Examples that satisfy our framework include sub-Gaussian linear regression, more general smoothly parameterized function classes, finite hypothesis classes, and bounded smoothness classes.

6/14/2024

cs.LG stat.ML

High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality

Urte Adomaityte, Leonardo Defilippis, Bruno Loureiro, Gabriele Sicuro

We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a transition in $delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for covariate distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.

6/3/2024

cs.LG stat.ML