Towards Lower Bounds on the Depth of ReLU Neural Networks

Read original: arXiv:2105.14835 - Published 7/18/2024 by Christoph Hertrich, Amitabh Basu, Marco Di Summa, Martin Skutella

🧠

Overview

This paper provides a mathematical analysis of the class of functions that can be represented by neural networks with ReLU (Rectified Linear Unit) activations and a given architecture.
The authors investigate whether the class of exactly representable functions strictly increases by adding more layers, without restrictions on the size of the network.
As a byproduct, the paper settles an old conjecture about piecewise linear functions.
The paper also presents upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.

Plain English Explanation

The researchers in this paper wanted to better understand the types of mathematical functions that can be represented by neural networks with a specific architecture and activation function. Neural networks with ReLU activations are commonly used in machine learning, and there are theorems that suggest a single hidden layer is enough to represent any function.

However, the authors used techniques from optimization, geometry, and algebra to show that this is not always the case. They investigated whether adding more layers to the network actually increases the set of functions that can be represented exactly, without any restrictions on the size of the network.

As a side result, the paper also solved an old problem about piecewise linear functions - functions that are made up of different linear pieces. Additionally, the researchers found upper limits on how large neural networks need to be to represent functions with a certain level of complexity.

Technical Explanation

The paper uses a combination of mathematical tools, including mixed-integer optimization, polyhedral theory, and tropical geometry, to analyze the class of functions that can be represented by neural networks with ReLU activations.

The authors investigate whether adding more layers to the network strictly increases the set of functions that can be represented exactly, without any restrictions on the size of the network. This is in contrast to the universal approximation theorems, which suggest that a single hidden layer is sufficient for learning any function.

As a byproduct of their investigations, the paper settles an old conjecture by Wang and Sun (2005) about the structure of piecewise linear functions. The authors also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth, building on previous work on the growth of parameters in ReLU networks.

Critical Analysis

The paper provides a rigorous mathematical analysis of the representational power of neural networks with ReLU activations, which is an important theoretical question in the field of deep learning. The authors' use of techniques from optimization, geometry, and algebra demonstrates their technical expertise and the depth of their investigation.

However, it's important to note that the results in this paper are primarily of theoretical interest. While the authors' findings challenge the universality of single-layer neural networks, the practical implications for real-world machine learning applications may be limited. The upper bounds on network size for representing functions with logarithmic depth are also of more theoretical than practical interest.

Additionally, the paper does not address the issue of trainability - even if a function can be represented by a neural network with a certain architecture, it may still be difficult to actually train the network to learn that function. Further research is needed to understand the interplay between representational capacity and learnability in deep neural networks.

Conclusion

This paper provides a significant mathematical counterbalance to the widespread belief that a single hidden layer is sufficient for learning any function. By using advanced techniques from optimization, geometry, and algebra, the authors demonstrate that the class of functions exactly representable by ReLU neural networks can indeed strictly increase with the addition of more layers.

While the results are primarily of theoretical interest, the paper contributes to a better understanding of the representational power of deep neural networks. The authors' insights into piecewise linear functions and the size of networks required for certain types of functions also add valuable knowledge to the field. As deep learning continues to advance, this type of rigorous mathematical analysis will be essential for pushing the boundaries of what is possible with neural network architectures.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

Towards Lower Bounds on the Depth of ReLU Neural Networks

Christoph Hertrich, Amitabh Basu, Marco Di Summa, Martin Skutella

We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.

7/18/2024

🧠

On Minimal Depth in Neural Networks

Juan L. Valerdi

A characterization of the representability of neural networks is relevant to comprehend their success in artificial intelligence. This study investigate two topics on ReLU neural network expressivity and their connection with a conjecture related to the minimum depth required for representing any continuous piecewise linear (CPWL) function. The topics are the minimal depth representation of the sum and max operations, as well as the exploration of polytope neural networks. For the sum operation, we establish a sufficient condition on the minimal depth of the operands to find the minimal depth of the operation. In contrast, regarding the max operation, a comprehensive set of examples is presented, demonstrating that no sufficient conditions, depending solely on the depth of the operands, would imply a minimal depth for the operation. The study also examine the minimal depth relationship between convex CPWL functions. On polytope neural networks, we investigate basic depth properties from Minkowski sums, convex hulls, number of vertices, faces, affine transformations, and indecomposable polytopes. More significant findings include depth characterization of polygons; identification of polytopes with an increasing number of vertices, exhibiting small depth and others with arbitrary large depth; and most notably, the minimal depth of simplices, which is strictly related to the minimal depth conjecture in ReLU networks.

6/10/2024

Implicit Hypersurface Approximation Capacity in Deep ReLU Networks

Jonatan Vallin, Karl Larsson, Mats G. Larson

We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations. Given a $d$-dimensional hypersurface in $mathbb{R}^{d+1}$ represented as the graph of a $C^2$-function $phi$, we show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour with a precision bound depending on the number of layers. This result is directly applicable to the binary classification setting where the sign of the network is trained as a classifier, with the network's zero contour as a decision boundary. Our proof is constructive and relies on the geometrical structure of ReLU layers provided in [doi:10.48550/arXiv.2310.03482]. Inspired by this geometrical description, we define a new equivalent network architecture that is easier to interpret geometrically, where the action of each hidden layer is a projection onto a polyhedral cone derived from the layer's parameters. By repeatedly adding such layers, with parameters chosen such that we project small parts of the graph of $phi$ from the outside in, we, in a controlled way, construct a network that implicitly approximates the graph over a ball of radius $R$. The accuracy of this construction is controlled by a discretization parameter $delta$ and we show that the tolerance in the resulting error bound scales as $(d-1)R^{3/2}delta^{1/2}$ and the required number of layers is of order $dbig(frac{32R}{delta}big)^{frac{d+1}{2}}$.

7/8/2024

Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

Owen Davis, Gianluca Geraci, Mohammad Motamed

In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex exponential activation functions. Our proof is constructive and proceeds by conducting a careful complexity analysis associated with the approximation of a Fourier features residual network by a ReLU network.

5/14/2024