Effective Learning with Node Perturbation in Deep Neural Networks

2310.00965

Published 5/28/2024 by Sander Dalm, Marcel van Gerven, Nasir Ahmad

🤿

Abstract

Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided noise-based search process. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives together with input decorrelation at every layer strongly enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible.

Create account to get full access

Overview

Backpropagation (BP) is the dominant and most successful method for training deep neural network models, but it has limitations
Node Perturbation (NP) proposes an alternative learning approach that injects noise into network activations and measures the impact on loss
Standard NP is inefficient and unstable, but this paper investigates methods to improve its performance

Plain English Explanation

Artificial neural networks are powerful machine learning models that can learn complex patterns in data. To train these networks, researchers often use a technique called backpropagation. Backpropagation is very successful, but it has some downsides - it requires two separate computational steps, doesn't fully explain how biological brains learn, and can be tricky to apply to networks with discontinuities or noisy dynamics.

An alternative approach is called node perturbation. With node perturbation, the network is trained by injecting random noise into the activations of its nodes (artificial neurons) and measuring how this changes the overall loss. Node perturbation only requires two forward passes through the network and doesn't use derivative information like backpropagation. It's also been proposed as a model for how biological brains might learn.

However, standard node perturbation is quite inefficient and unstable, because the random noise-based search process is unguided. This paper explores ways to improve node perturbation by aligning it more closely with directional derivatives and adding a decorrelation mechanism for the layer inputs. These innovations significantly boost the performance of node perturbation, allowing it to approach the accuracy of backpropagation. The new formulation also enables applying node perturbation to noisy systems where the noise process itself is unknown.

Technical Explanation

This paper investigates different formulations of the node perturbation (NP) learning algorithm and relates it to the concept of directional derivatives. NP proposes learning by injecting noise into network activations and measuring the induced change in loss. Standard NP is highly data inefficient and unstable due to its unguided noise-based search process.

The authors explore aligning NP more closely with directional derivatives, which provide information about the gradient of the loss function. They also introduce a decorrelating mechanism for the layer-wise inputs to the network. This decorrelation helps to ensure that the noise injected at each layer has a meaningful impact on the loss.

Through these innovations, the authors are able to significantly enhance the performance of NP learning. The modified NP approach approaches the accuracy of backpropagation (BP) on test data, while maintaining NP's advantages of only requiring two forward passes and not using derivative information.

Importantly, the new NP formulation also allows for application to noisy systems where the noise process itself is inaccessible. This makes it a promising alternative to BP in settings with discontinuities or unknown noise dynamics, which can be challenging for BP.

Critical Analysis

The paper provides a thorough investigation of node perturbation and introduces several interesting techniques to improve its performance. The alignment with directional derivatives and the input decorrelation mechanism are clever ideas that seem to address key limitations of standard NP.

That said, the paper does not provide a comprehensive comparison of NP to other gradient-free optimization methods, such as evolutionary strategies or reinforcement learning approaches. It would be helpful to understand how the enhanced NP technique compares to these other alternatives, especially in terms of sample efficiency and scalability to large networks.

Additionally, the paper does not explore the biological plausibility of the NP learning rule in depth. While it is proposed as a model for biological learning, more discussion of the neurophysiological mechanisms and constraints would strengthen the connection to neuroscience.

Overall, this is a well-executed study that makes a strong case for the potential of node perturbation as a viable alternative to backpropagation, especially in challenging domains. The authors have laid the groundwork for further research and development of these types of noise-based learning algorithms.

Conclusion

This paper presents an enhanced formulation of the node perturbation (NP) learning algorithm, which proposes an alternative to the dominant backpropagation (BP) technique for training deep neural networks. By aligning NP more closely with directional derivatives and incorporating a decorrelating mechanism for layer inputs, the authors are able to significantly improve the performance of NP, allowing it to approach the accuracy of BP on test data.

Crucially, the new NP formulation also enables application to noisy systems where the noise process itself is unknown, making it a promising alternative to BP in settings with discontinuities or complex dynamics. While more research is needed to fully understand the biological plausibility and scalability of NP, this work represents an important step forward in exploring alternative learning algorithms for artificial neural networks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Perturbation-based Learning for Recurrent Neural Networks

Jesus Garcia Fernandez, Sander Keemink, Marcel van Gerven

Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle with propagating gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. Subsequently, we conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability when compared to BPTT, strongly outperforming standard node perturbation and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs which can be ideally suited for neuromorphic applications

5/27/2024

cs.LG

📉

Contribute to balance, wire in accordance: Emergence of backpropagation from a simple, bio-plausible neuroplasticity rule

Xinhao Fan, Shreesh P Mysore

Backpropagation (BP) has been pivotal in advancing machine learning and remains essential in computational applications and comparative studies of biological and artificial neural networks. Despite its widespread use, the implementation of BP in the brain remains elusive, and its biological plausibility is often questioned due to inherent issues such as the need for symmetry of weights between forward and backward connections, and the requirement of distinct forward and backward phases of computation. Here, we introduce a novel neuroplasticity rule that offers a potential mechanism for implementing BP in the brain. Similar in general form to the classical Hebbian rule, this rule is based on the core principles of maintaining the balance of excitatory and inhibitory inputs as well as on retrograde signaling, and operates over three progressively slower timescales: neural firing, retrograde signaling, and neural plasticity. We hypothesize that each neuron possesses an internal state, termed credit, in addition to its firing rate. After achieving equilibrium in firing rates, neurons receive credits based on their contribution to the E-I balance of postsynaptic neurons through retrograde signaling. As the network's credit distribution stabilizes, connections from those presynaptic neurons are strengthened that significantly contribute to the balance of postsynaptic neurons. We demonstrate mathematically that our learning rule precisely replicates BP in layered neural networks without any approximations. Simulations on artificial neural networks reveal that this rule induces varying community structures in networks, depending on the learning rate. This simple theoretical framework presents a biologically plausible implementation of BP, with testable assumptions and predictions that may be evaluated through biological experiments.

5/24/2024

cs.LG cs.NE

Two Tales of Single-Phase Contrastive Hebbian Learning

Rasmus Kj{ae}r H{o}ier, Christopher Zach

The search for ``biologically plausible'' learning algorithms has converged on the idea of representing gradients as activity differences. However, most approaches require a high degree of synchronization (distinct phases during learning) and introduce substantial computational overhead, which raises doubts regarding their biological plausibility as well as their potential utility for neuromorphic computing. Furthermore, they commonly rely on applying infinitesimal perturbations (nudges) to output units, which is impractical in noisy environments. Recently it has been shown that by modelling artificial neurons as dyads with two oppositely nudged compartments, it is possible for a fully local learning algorithm named ``dual propagation'' to bridge the performance gap to backpropagation, without requiring separate learning phases or infinitesimal nudging. However, the algorithm has the drawback that its numerical stability relies on symmetric nudging, which may be restrictive in biological and analog implementations. In this work we first provide a solid foundation for the objective underlying the dual propagation method, which also reveals a surprising connection with adversarial robustness. Second, we demonstrate how dual propagation is related to a particular adjoint state method, which is stable regardless of asymmetric nudging.

6/26/2024

cs.LG cs.NE

Towards Biologically Plausible Computing: A Comprehensive Comparison

Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, global error computation, and dual-phase training. To address this long-standing challenge, many studies have endeavored to devise biologically plausible training algorithms. However, a fully biologically plausible algorithm for training multilayer neural networks remains elusive, and interpretations of biological plausibility vary among researchers. In this study, we establish criteria for biological plausibility that a desirable learning algorithm should meet. Using these criteria, we evaluate a range of existing algorithms considered to be biologically plausible, including Hebbian learning, spike-timing-dependent plasticity, feedback alignment, target propagation, predictive coding, forward-forward algorithm, perturbation learning, local losses, and energy-based learning. Additionally, we empirically evaluate these algorithms across diverse network architectures and datasets. We compare the feature representations learned by these algorithms with brain activity recorded by non-invasive devices under identical stimuli, aiming to identify which algorithm can most accurately replicate brain activity patterns. We are hopeful that this study could inspire the development of new biologically plausible algorithms for training multilayer networks, thereby fostering progress in both the fields of neuroscience and machine learning.

6/26/2024

cs.NE