Wilsonian Renormalization of Neural Network Gaussian Processes

2405.06008

Published 5/13/2024 by Jessica N. Howard, Ro Jefferson, Anindita Maiti, Zohar Ringel

Wilsonian Renormalization of Neural Network Gaussian Processes

Abstract

Separating relevant and irrelevant information is key to any modeling process or scientific inquiry. Theoretical physics offers a powerful tool for achieving this in the form of the renormalization group (RG). Here we demonstrate a practical approach to performing Wilsonian RG in the context of Gaussian Process (GP) Regression. We systematically integrate out the unlearnable modes of the GP kernel, thereby obtaining an RG flow of the Gaussian Process in which the data plays the role of the energy scale. In simple cases, this results in a universal flow of the ridge parameter, which becomes input-dependent in the richer scenario in which non-Gaussianities are included. In addition to being analytically tractable, this approach goes beyond structural analogies between RG and neural networks by providing a natural connection between RG flow and learnable vs. unlearnable modes. Studying such flows may improve our understanding of feature learning in deep neural networks, and identify potential universality classes in these models.

Create account to get full access

Overview

This paper explores the application of Wilsonian renormalization, a concept from theoretical physics, to the study of neural network Gaussian processes.
The authors investigate how Wilsonian renormalization can be used to understand the scaling behavior of neural networks and improve their performance in high-dimensional regression tasks.
The research connects the scaling laws and universal statistical structure of complex systems to the behavior of neural networks, building on prior work in integrated generative modeling and multi-layer random feature approximations.

Plain English Explanation

The paper explores how a concept from theoretical physics called Wilsonian renormalization can be applied to understand the behavior of neural networks. Wilsonian renormalization is a way of simplifying complex systems by focusing on the most important features at different scales.

The authors show how this approach can be used to study neural networks, which are complex systems that learn to perform tasks by adjusting the strengths of connections between artificial neurons. By applying Wilsonian renormalization, the researchers were able to gain insights into how the performance of neural networks scales with the size and complexity of the problem they are trying to solve.

This work builds on previous research that has looked at the underlying scaling laws and universal statistical structure of complex systems, as well as studies on integrated generative modeling and multi-layer random feature approximations in neural networks. By connecting these different areas of research, the authors hope to develop a better understanding of how neural networks work and how they can be improved.

Technical Explanation

The paper presents a Wilsonian renormalization approach for analyzing neural network Gaussian processes (NNGPs). The authors show how Wilsonian renormalization can be used to study the scaling behavior of NNGPs in high-dimensional regression tasks, building on previous work on scaling laws and universal statistical structure in complex systems and integrated generative modeling using attentive planar networks.

The core idea is to use Wilsonian renormalization to coarse-grain the NNGP and identify the most relevant degrees of freedom at different length scales. This allows the authors to derive scaling relations that describe how the performance of the NNGP changes as the problem dimensionality and other parameters are varied.

The technical approach involves:

Defining a Wilsonian effective action for the NNGP that captures the relevant degrees of freedom at different scales.
Deriving renormalization group (RG) flow equations that describe how the effective action changes under coarse-graining.
Analyzing the fixed points and scaling behavior of the RG flow to understand the asymptotic performance of the NNGP.

The authors validate their theoretical predictions through numerical experiments, demonstrating the power of the Wilsonian renormalization approach for understanding and improving neural network performance in high-dimensional regression tasks.

Critical Analysis

The Wilsonian renormalization approach presented in the paper provides a promising new tool for analyzing the behavior of neural networks. By connecting neural networks to concepts from theoretical physics, the authors are able to gain insights into the scaling laws and universal statistical structure that govern the performance of these complex systems.

One key strength of the work is the ability to derive scaling relations that describe how neural network performance changes as the problem dimensionality and other parameters are varied. This could be particularly useful for understanding the limitations of multi-layer random feature approximations and designing more effective neural network architectures for high-dimensional regression tasks.

However, the paper does not address some potential limitations of the Wilsonian renormalization approach. For example, the method relies on a number of assumptions and approximations, and it is not clear how robust the results are to deviations from these assumptions. Additionally, the computational overhead of the Wilsonian renormalization procedure may limit its practical applicability, especially for large-scale neural network models.

Further research is needed to explore the boundaries of the Wilsonian renormalization framework and to investigate how it can be integrated with other techniques for analyzing and improving neural network performance, such as inverse exact renormalization group flows and physics-integrated generative modeling. By combining these complementary approaches, researchers may be able to develop a more comprehensive understanding of the underlying principles that govern the behavior of neural networks.

Conclusion

This paper presents a novel application of Wilsonian renormalization to the study of neural network Gaussian processes, demonstrating how this concept from theoretical physics can provide valuable insights into the scaling behavior and performance of these complex systems.

The work builds on a growing body of research exploring the scaling laws and universal statistical structure of complex systems, as well as integrated generative modeling and multi-layer random feature approximations in neural networks. By connecting these different areas of research, the authors have developed a new framework for understanding and potentially improving the performance of neural networks in high-dimensional regression tasks.

While the Wilsonian renormalization approach presented in the paper shows promise, further research is needed to fully explore its capabilities and limitations. By continuing to bridge the gap between physics and machine learning, researchers may be able to unlock new insights and design more effective neural network models for a wide range of applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Bayesian RG Flow in Neural Network Field Theories

Jessica N. Howard, Marc S. Klinger, Anindita Maiti, Alexander G. Stapleton

The Neural Network Field Theory correspondence (NNFT) is a mapping from neural network (NN) architectures into the space of statistical field theories (SFTs). The Bayesian renormalization group (BRG) is an information-theoretic coarse graining scheme that generalizes the principles of the Exact Renormalization Group (ERG) to arbitrarily parameterized probability distributions, including those of NNs. In BRG, coarse graining is performed in parameter space with respect to an information-theoretic distinguishability scale set by the Fisher information metric. In this paper, we unify NNFT and BRG to form a powerful new framework for exploring the space of NNs and SFTs, which we coin BRG-NNFT. With BRG-NNFT, NN training dynamics can be interpreted as inducing a flow in the space of SFTs from the information-theoretic `IR' $rightarrow$ `UV'. Conversely, applying an information-shell coarse graining to the trained network's parameters induces a flow in the space of SFTs from the information-theoretic `UV' $rightarrow$ `IR'. When the information-theoretic cutoff scale coincides with a standard momentum scale, BRG is equivalent to ERG. We demonstrate the BRG-NNFT correspondence on two analytically tractable examples. First, we construct BRG flows for trained, infinite-width NNs, of arbitrary depth, with generic activation functions. As a special case, we then restrict to architectures with a single infinitely-wide layer, scalar outputs, and generalized cos-net activations. In this case, we show that BRG coarse-graining corresponds exactly to the momentum-shell ERG flow of a free scalar SFT. Our analytic results are corroborated by a numerical experiment in which an ensemble of asymptotically wide NNs are trained and subsequently renormalized using an information-shell BRG scheme.

5/29/2024

cs.LG

Scaling and renormalization in high-dimensional regression

Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

6/27/2024

stat.ML cs.LG

🤯

The Inverse of Exact Renormalization Group Flows as Statistical Inference

David S. Berman, Marc S. Klinger

We build on the view of the Exact Renormalization Group (ERG) as an instantiation of Optimal Transport described by a functional convection-diffusion equation. We provide a new information theoretic perspective for understanding the ERG through the intermediary of Bayesian Statistical Inference. This connection is facilitated by the Dynamical Bayesian Inference scheme, which encodes Bayesian inference in the form of a one parameter family of probability distributions solving an integro-differential equation derived from Bayes' law. In this note, we demonstrate how the Dynamical Bayesian Inference equation is, itself, equivalent to a diffusion equation which we dub Bayesian Diffusion. Identifying the features that define Bayesian Diffusion, and mapping them onto the features that define the ERG, we obtain a dictionary outlining how renormalization can be understood as the inverse of statistical inference.

5/2/2024

cs.AI

Physics-integrated generative modeling using attentive planar normalizing flow based variational autoencoder

Sheikh Waqas Akhtar

Physics-integrated generative modeling is a class of hybrid or grey-box modeling in which we augment the the data-driven model with the physics knowledge governing the data distribution. The use of physics knowledge allows the generative model to produce output in a controlled way, so that the output, by construction, complies with the physical laws. It imparts improved generalization ability to extrapolate beyond the training distribution as well as improved interpretability because the model is partly grounded in firm domain knowledge. In this work, we aim to improve the fidelity of reconstruction and robustness to noise in the physics integrated generative model. To this end, we use variational-autoencoder as a generative model. To improve the reconstruction results of the decoder, we propose to learn the latent posterior distribution of both the physics as well as the trainable data-driven components using planar normalizng flow. Normalizng flow based posterior distribution harnesses the inherent dynamical structure of the data distribution, hence the learned model gets closer to the true underlying data distribution. To improve the robustness of generative model against noise injected in the model, we propose a modification in the encoder part of the normalizing flow based VAE. We designed the encoder to incorporate scaled dot product attention based contextual information in the noisy latent vector which will mitigate the adverse effect of noise in the latent vector and make the model more robust. We empirically evaluated our models on human locomotion dataset [33] and the results validate the efficacy of our proposed models in terms of improvement in reconstruction quality as well as robustness against noise injected in the model.

4/19/2024

cs.LG cs.AI stat.ML