Approximation by non-symmetric networks for cross-domain learning

Read original: arXiv:2305.03890 - Published 9/17/2024 by Hrushikesh Mhaskar

🔎

Overview

Machine learning has driven research into the approximation capabilities of various processes like neural networks and kernel methods.
This paper explores the approximation power of kernel-based networks using non-symmetric kernels, which is motivated by applications like invariant learning and transfer learning.
The researchers take a general approach, considering families of kernels like generalized translation networks and rotated zonal function kernels.
Unlike traditional kernel methods, the kernels used here are not required to be positive definite.
The paper provides estimates on the accuracy of approximating functions in a Sobolev class using ReLU^r networks, where r is not necessarily an integer.

Plain English Explanation

Machine learning has led to a lot of research on how well different techniques can approximate or represent various types of data and functions. This paper looks at a specific type of machine learning model called kernel-based networks, which use a mathematical function called a "kernel" to capture patterns in the data.

Typically, these kernel-based methods require the kernel to have certain properties, like being "positive definite." However, this paper takes a more general approach and considers kernels that don't have this restriction. The researchers are motivated by applications like invariant learning and transfer learning, where these more flexible kernels might be useful.

The paper provides mathematical analysis showing how well these kernel-based networks can approximate certain types of functions, even when the kernels don't have the usual positive definite property. This could be helpful for developing new machine learning models that can work with a wider range of data and problems, including those involving non-metric spaces.

Technical Explanation

The paper examines the approximation capabilities of kernel-based networks using non-symmetric kernels. Unlike traditional kernel methods, the kernels used here are not required to be positive definite. The researchers consider a family of kernels, including generalized translation networks (which encompass neural networks and translation invariant kernels) and rotated zonal function kernels.

The key technical contribution is obtaining estimates on the accuracy of uniformly approximating functions in a Sobolev class using ReLU^r networks, where r is not necessarily an integer. This allows for more flexible activation functions beyond the standard ReLU (rectified linear unit). The general results apply to functions with relatively low smoothness compared to the input dimensionality.

The analysis involves a combination of techniques, including leveraging the properties of singular value decomposition and generalizing the notion of approximation to include a wider family of kernels. This allows the researchers to characterize the approximation power of these kernel-based networks in a more comprehensive way.

Critical Analysis

The paper presents a thorough theoretical analysis of the approximation capabilities of kernel-based networks using non-symmetric kernels. This is an important contribution, as relaxing the positive definite requirement on the kernels opens up new possibilities for machine learning models to handle a wider range of data and applications.

However, the paper does not provide any empirical evaluation or experiments demonstrating the practical benefits of these techniques. While the mathematical analysis is rigorous, it would be helpful to see how the proposed methods perform on real-world problems compared to standard kernel methods or neural networks.

Additionally, the paper does not discuss potential limitations or challenges that may arise when using these non-symmetric kernels in practice. For example, it's unclear how the choice of kernel family or hyperparameters might affect the approximation performance, or how robust these methods are to noise or other real-world complexities.

Further research could explore the implementation and empirical evaluation of these kernel-based networks, as well as investigate potential extensions or variations that could enhance their applicability and practical impact.

Conclusion

This paper presents a theoretical study of the approximation capabilities of kernel-based networks using non-symmetric kernels. By considering a broader class of kernels beyond the traditional positive definite ones, the researchers have laid the groundwork for developing more flexible and powerful machine learning models.

The mathematical analysis provides insights into the accuracy of approximating certain types of functions, which could be valuable for applications like invariant learning, transfer learning, and synthetic aperture radar imaging. While the paper does not include empirical evaluations, the theoretical foundations established here could inspire future research to explore the practical implementation and real-world performance of these techniques.

Overall, this work contributes to the ongoing efforts to push the boundaries of machine learning's expressive power and expand the toolbox of available methods for tackling complex data and learning challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Approximation by non-symmetric networks for cross-domain learning

Hrushikesh Mhaskar

For the past 30 years or so, machine learning has stimulated a great deal of research in the study of approximation capabilities (expressive power) of a multitude of processes, such as approximation by shallow or deep neural networks, radial basis function networks, and a variety of kernel based methods. Motivated by applications such as invariant learning, transfer learning, and synthetic aperture radar imaging, we initiate in this paper a general approach to study the approximation capabilities of kernel based networks using non-symmetric kernels. While singular value decomposition is a natural instinct to study such kernels, we consider a more general approach to include the use of a family of kernels, such as generalized translation networks (which include neural networks and translation invariant kernels as special cases) and rotated zonal function kernels. Naturally, unlike traditional kernel based approximation, we cannot require the kernels to be positive definite. In particular, we obtain estimates on the accuracy of uniform approximation of functions in a Sobolev class by ReLU$^r$ networks when $r$ is not necessarily an integer. Our general results apply to the approximation of functions with small smoothness compared to the dimension of the input space.

9/17/2024

🧠

Multi-layer random features and the approximation power of neural networks

Rustem Takhanov

A neural architecture with randomly initialized weights, in the infinite width limit, is equivalent to a Gaussian Random Field whose covariance function is the so-called Neural Network Gaussian Process kernel (NNGP). We prove that a reproducing kernel Hilbert space (RKHS) defined by the NNGP contains only functions that can be approximated by the architecture. To achieve a certain approximation error the required number of neurons in each layer is defined by the RKHS norm of the target function. Moreover, the approximation can be constructed from a supervised dataset by a random multi-layer representation of an input vector, together with training of the last layer's weights. For a 2-layer NN and a domain equal to an $n-1$-dimensional sphere in ${mathbb R}^n$, we compare the number of neurons required by Barron's theorem and by the multi-layer features construction. We show that if eigenvalues of the integral operator of the NNGP decay slower than $k^{-n-frac{2}{3}}$ where $k$ is an order of an eigenvalue, then our theorem guarantees a more succinct neural network approximation than Barron's theorem. We also make some computational experiments to verify our theoretical findings. Our experiments show that realistic neural networks easily learn target functions even when both theorems do not give any guarantees.

4/29/2024

🤿

Constructive Universal Approximation Theorems for Deep Joint-Equivariant Networks by Schur's Lemma

Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

We present a unified constructive universal approximation theorem covering a wide range of learning machines including both shallow and deep neural networks based on the group representation theory. Constructive here means that the distribution of parameters is given in a closed-form expression (called the ridgelet transform). Contrary to the case of shallow models, expressive power analysis of deep models has been conducted in a case-by-case manner. Recently, Sonoda et al. (2023a,b) developed a systematic method to show a constructive approximation theorem from scalar-valued joint-group-invariant feature maps, covering a formal deep network. However, each hidden layer was formalized as an abstract group action, so it was not possible to cover real deep networks defined by composites of nonlinear activation function. In this study, we extend the method for vector-valued joint-group-equivariant feature maps, so to cover such real networks.

5/24/2024

🤖

Symmetric Kernels with Non-Symmetric Data: A Data-Agnostic Learnability Bound

Itay Lavie, Zohar Ringel

Kernel ridge regression (KRR) and Gaussian processes (GPs) are fundamental tools in statistics and machine learning with recent applications to highly over-parameterized deep neural networks. The ability of these tools to learn a target function is directly related to the eigenvalues of their kernel sampled on the input data. Targets having support on higher eigenvalues are more learnable. While kernels are often highly symmetric objects, the data is often not. Thus kernel symmetry seems to have little to no bearing on the above eigenvalues or learnability, making spectral analysis on real-world data challenging. Here, we show that contrary to this common lure, one may use eigenvalues and eigenfunctions associated with highly idealized data-measures to bound learnability on realistic data. As a demonstration, we give a theoretical lower bound on the sample complexity of copying heads for kernels associated with generic transformers acting on natural language.

6/6/2024