Stationarity without mean reversion in improper Gaussian processes

Read original: arXiv:2310.02877 - Published 5/16/2024 by Luca Ambrogioni

✨

Overview

Gaussian Process (GP) regression is a powerful machine learning technique, but its behavior depends on the choice of covariance function.
Stationary covariance functions are commonly used, but they can exhibit pathological behavior when applied to data that does not relax to a fixed global mean value.
This paper introduces a family of non-positive kernels that can be used to define stationary but non-mean-reverting GP processes, which can address the limitations of standard stationary covariance functions.

Plain English Explanation

Gaussian Process (GP) regression is a powerful machine learning technique that can be used for tasks like predicting trends or modeling data with constraints. The behavior of a GP model depends on the choice of covariance function, which determines how the model "sees" the relationship between different data points.

Commonly used stationary covariance functions, like the squared exponential or Matérn kernels, have a useful property: they are "mean-reverting," meaning the model expects the data to eventually return to a fixed average value. This makes them well-suited for many applications.

However, this mean-reverting behavior can be problematic if the data you're trying to model doesn't actually have a fixed global mean. In these cases, the GP model may exhibit undesirable behavior.

This paper introduces a new family of non-positive kernels that can be used to define stationary but non-mean-reverting GP processes. These kernels closely resemble the standard stationary kernels, but they don't force the data to revert to a fixed mean. This allows the GP model to better capture trends in data that don't have a global mean.

Technical Explanation

The key insight of this paper is the use of non-positive kernels to define stationary but non-mean-reverting GP processes. The authors show that it is possible to construct such kernels that closely resemble commonly used stationary kernels, like the squared exponential and Matérn classes.

These non-positive kernels can only be defined in a limit regime where the kernel variance is infinite. However, the authors demonstrate that the resulting posterior distributions can still be computed analytically, with a simple correction to the usual GP formulas.

The authors evaluate their approach on both synthetic and real-world data, showing that the non-mean-reverting GP models can effectively capture trends in data that do not relax to a fixed global mean, while retaining many of the favorable properties of ordinary smooth stationary kernels.

Critical Analysis

The authors provide a comprehensive analysis of the proposed non-positive kernels and their behavior, including a discussion of the theoretical underpinnings and the practical implications. They acknowledge that the use of improper priors with infinite variance may raise some technical concerns that require further investigation.

Additionally, the authors note that the non-mean-reverting behavior of the proposed kernels may not be suitable for all applications, and that the choice of kernel should be carefully considered based on the characteristics of the data and the specific modeling objectives.

Further research could explore the robustness of these non-positive kernels in the presence of noise or other sources of uncertainty, as well as their potential applications in more complex modeling scenarios.

Conclusion

This paper introduces a new family of non-positive kernels that can be used to define stationary but non-mean-reverting Gaussian Process models. These kernels address a known limitation of standard stationary covariance functions, which can exhibit pathological behavior when applied to data that does not relax to a fixed global mean.

The authors demonstrate that the proposed non-positive kernels can effectively capture trends in real-world data while retaining many of the favorable properties of ordinary smooth stationary kernels. This work expands the toolbox of available covariance functions for Gaussian Process regression, providing a flexible approach that can better accommodate a wider range of data characteristics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

✨

Stationarity without mean reversion in improper Gaussian processes

Luca Ambrogioni

The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are preferred in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper we show that it is possible to use improper GP priors with infinite variance to define processes that are stationary but not mean reverting. To this aim, we use of non-positive kernels that can only be defined in this limit regime. The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. The main contribution of the paper is the introduction of a large family of smooth non-reverting covariance functions that closely resemble the kernels commonly used in the GP literature (e.g. squared exponential and Mat'ern class). By analyzing both synthetic and real data, we demonstrate that these non-positive kernels solve some known pathologies of mean reverting GP regression while retaining most of the favorable properties of ordinary smooth stationary kernels.

5/16/2024

🏋️

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces II: non-compact symmetric spaces

Iskander Azangulov, Andrei Smolensky, Alexander Terenin, Viacheslav Borovitskiy

Gaussian processes are arguably the most important class of spatiotemporal models within machine learning. They encode prior information about the modeled function and can be used for exact or approximate Bayesian learning. In many applications, particularly in physical sciences and engineering, but also in areas such as geostatistics and neuroscience, invariance to symmetries is one of the most fundamental forms of prior information one can consider. The invariance of a Gaussian process' covariance to such symmetries gives rise to the most natural generalization of the concept of stationarity to such spaces. In this work, we develop constructive and practical techniques for building stationary Gaussian processes on a very large class of non-Euclidean spaces arising in the context of symmetries. Our techniques make it possible to (i) calculate covariance kernels and (ii) sample from prior and posterior Gaussian processes defined on such spaces, both in a practical manner. This work is split into two parts, each involving different technical considerations: part I studies compact spaces, while part II studies non-compact spaces possessing certain structure. Our contributions make the non-Euclidean Gaussian process models we study compatible with well-understood computational techniques available in standard Gaussian process software packages, thereby making them accessible to practitioners.

9/16/2024

🌀

A New Reliable & Parsimonious Learning Strategy Comprising Two Layers of Gaussian Processes, to Address Inhomogeneous Empirical Correlation Structures

Gargi Roy, Dalia Chakrabarty

We present a new strategy for learning the functional relation between a pair of variables, while addressing inhomogeneities in the correlation structure of the available data, by modelling the sought function as a sample function of a non-stationary Gaussian Process (GP), that nests within itself multiple other GPs, each of which we prove can be stationary, thereby establishing sufficiency of two GP layers. In fact, a non-stationary kernel is envisaged, with each hyperparameter set as dependent on the sample function drawn from the outer non-stationary GP, such that a new sample function is drawn at every pair of input values at which the kernel is computed. However, such a model cannot be implemented, and we substitute this by recalling that the average effect of drawing different sample functions from a given GP is equivalent to that of drawing a sample function from each of a set of GPs that are rendered different, as updated during the equilibrium stage of the undertaken inference (via MCMC). The kernel is fully non-parametric, and it suffices to learn one hyperparameter per layer of GP, for each dimension of the input variable. We illustrate this new learning strategy on a real dataset.

4/22/2024

Wiener Chaos in Kernel Regression: Towards Untangling Aleatoric and Epistemic Uncertainty

T. Faulwasser, O. Molodchyk

Gaussian Processes (GPs) are a versatile method that enables different approaches towards learning for dynamics and control. Gaussianity assumptions appear in two dimensions in GPs: The positive semi-definite kernel of the underlying reproducing kernel Hilbert space is used to construct the co-variance of a Gaussian distribution over functions, while measurement noise (i.e. data corruption) is usually modeled as i.i.d. additive Gaussians. In this note, we generalize the setting and consider kernel ridge regression with additive i.i.d. non-Gaussian measurement noise. To apply the usual kernel trick, we rely on the representation of the uncertainty via polynomial chaos expansions, which are series expansions for random variables of finite variance introduced by Norbert Wiener. We derive and discuss the analytic $mathcal{L}^2$ solution to the arising Wiener kernel regression. Considering a polynomial dynamic system as a numerical example, we show that our approach allows us to distinguish the uncertainty that stems from the noise in the data samples from the total uncertainty encoded in the GP posterior distribution.

9/14/2024