Trainable Highly-expressive Activation Functions

Read original: arXiv:2407.07564 - Published 7/12/2024 by Irit Chelly, Shahaf E. Finder, Shira Ifergane, Oren Freifeld

⛏️

Overview

This paper introduces a new class of trainable, highly-expressive activation functions for deep neural networks.
The proposed activation functions can be automatically learned during the training process, allowing the network to discover the most suitable nonlinear transformations for the task at hand.
This contrasts with commonly used fixed activation functions like ReLU or tanh, which have limited expressiveness.
The authors demonstrate the effectiveness of their trainable activation functions on several benchmark tasks, showing improved performance compared to standard activation functions.

Plain English Explanation

Neural networks use special functions called activation functions to introduce nonlinearity and enable them to learn complex patterns in data. Traditionally, network designers have relied on a handful of fixed activation functions like ReLU or tanh. However, these standard options may not be the most suitable for every task or dataset.

The researchers in this paper propose a new approach where the activation function itself can be automatically learned during training, rather than being pre-defined. Their trainable activation functions are highly flexible and expressive, allowing the network to discover the optimal nonlinear transformations for the problem at hand.

This is like having a set of paint brushes that can change shape and size as you use them, rather than being stuck with a fixed set of brushes. The network can "paint" the best possible representation of the data by adaptively adjusting its activation functions.

The authors show that their trainable activation functions outperform standard choices like ReLU or tanh on several benchmark tests. This suggests that allowing the network to learn its own activation functions can lead to better performance on a variety of machine learning tasks.

Technical Explanation

The key innovation in this paper is the introduction of a new class of trainable, highly-expressive activation functions. Unlike standard fixed activation functions, these new functions can be automatically learned during the training process.

The authors propose several parameterized activation function families, including polynomials, rational functions, and trigonometric functions. These flexible forms allow the network to discover the most suitable nonlinear transformations for the task at hand, rather than being limited to a predefined set of options.

To enable efficient training of the activation function parameters alongside the network weights, the authors develop a progressive training scheme that gradually increases the complexity of the activation function. This helps the network find good local optima and avoid getting stuck in poor configurations.

Experiments on various benchmark datasets and tasks demonstrate the effectiveness of the trainable activation functions. The authors show consistent performance improvements over using standard ReLU or tanh activations, highlighting the benefits of allowing the network to adaptively learn its own nonlinear transformations.

Critical Analysis

The proposed trainable activation functions represent a promising direction for improving the expressive power and adaptability of neural networks. By allowing the activation functions to be learned, the network can discover more suitable nonlinear transformations for the task at hand, going beyond the limitations of fixed activation functions.

However, the authors acknowledge that the increased flexibility comes at the cost of additional training complexity and hyperparameters to tune. The progressive training scheme helps, but the overall training process may be more involved compared to using standard activations.

Additionally, the paper does not explore the interpretability or explainability of the learned activation functions. Understanding the specific characteristics and behaviors of the discovered nonlinearities could provide valuable insights, but this aspect is not addressed.

Further research could explore the generalization of trainable activation functions to other neural network architectures and tasks, as well as investigate their robustness and stability properties. Comparisons to more recently proposed activation functions could also provide additional insights.

Conclusion

This paper introduces a new class of trainable, highly-expressive activation functions that can be automatically learned during the training of deep neural networks. By allowing the network to discover the most suitable nonlinear transformations for the task at hand, the authors demonstrate consistent performance improvements over standard fixed activation functions like ReLU and tanh.

While the increased flexibility comes with additional training complexity, the results suggest that enabling neural networks to adaptively learn their own activation functions is a promising direction for enhancing their expressive power and performance. Further research in this area could lead to more efficient and adaptable neural network models, with potential applications across a wide range of domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⛏️

Trainable Highly-expressive Activation Functions

Irit Chelly, Shahaf E. Finder, Shira Ifergane, Oren Freifeld

Nonlinear activation functions are pivotal to the success of deep neural nets, and choosing the appropriate activation function can significantly affect their performance. Most networks use fixed activation functions (e.g., ReLU, GELU, etc.), and this choice might limit their expressiveness. Furthermore, different layers may benefit from diverse activation functions. Consequently, there has been a growing interest in trainable activation functions. In this paper, we introduce DiTAC, a trainable highly-expressive activation function based on an efficient diffeomorphic transformation (called CPAB). Despite introducing only a negligible number of trainable parameters, DiTAC enhances model expressiveness and performance, often yielding substantial improvements. It also outperforms existing activation functions (regardless whether the latter are fixed or trainable) in tasks such as semantic segmentation, image generation, regression problems, and image classification. Our code is available at https://github.com/BGU-CS-VIL/DiTAC.

7/12/2024

📉

Nonlinearity Enhanced Adaptive Activation Function

David Yevick

A simply implemented activation function with even cubic nonlinearity is introduced that increases the accuracy of neural networks without substantial additional computational resources. This is partially enabled through an apparent tradeoff between convergence and accuracy. The activation function generalizes the standard RELU function by introducing additional degrees of freedom through optimizable parameters that enable the degree of nonlinearity to be adjusted. The associated accuracy enhancement is quantified in the context of the MNIST digit data set through a comparison with standard techniques.

4/1/2024

🛠️

Activation Function Optimization Scheme for Image Classification

Abdur Rahman, Lu He, Haifeng Wang

Activation function has a significant impact on the dynamics, convergence, and performance of deep neural networks. The search for a consistent and high-performing activation function has always been a pursuit during deep learning model development. Existing state-of-the-art activation functions are manually designed with human expertise except for Swish. Swish was developed using a reinforcement learning-based search strategy. In this study, we propose an evolutionary approach for optimizing activation functions specifically for image classification tasks, aiming to discover functions that outperform current state-of-the-art options. Through this optimization framework, we obtain a series of high-performing activation functions denoted as Exponential Error Linear Unit (EELU). The developed activation functions are evaluated for image classification tasks from two perspectives: (1) five state-of-the-art neural network architectures, such as ResNet50, AlexNet, VGG16, MobileNet, and Compact Convolutional Transformer which cover computationally heavy to light neural networks, and (2) eight standard datasets, including CIFAR10, Imagenette, MNIST, Fashion MNIST, Beans, Colorectal Histology, CottonWeedID15, and TinyImageNet which cover from typical machine vision benchmark, agricultural image applications to medical image applications. Finally, we statistically investigate the generalization of the resultant activation functions developed through the optimization scheme. With a Friedman test, we conclude that the optimization scheme is able to generate activation functions that outperform the existing standard ones in 92.8% cases among 28 different cases studied, and $-xcdot erf(e^{-x})$ is found to be the best activation function for image classification generated by the optimization scheme.

9/10/2024

Adaptive Parametric Activation

Konstantinos Panagiotis Alexandridis, Jiankang Deng, Anh Nguyen, Shan Luo

The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS and balanced benchmarks such as ImageNet1K, COCO and V3DET. The code is available at https://github.com/kostas1515/AGLU.

7/12/2024