Activation Function Optimization Scheme for Image Classification

Read original: arXiv:2409.04915 - Published 9/10/2024 by Abdur Rahman, Lu He, Haifeng Wang

🛠️

Overview

Activation functions are critical in the performance of deep neural networks.
Existing state-of-the-art activation functions are manually designed, except for Swish which used reinforcement learning.
This study proposes an evolutionary approach to optimize activation functions specifically for image classification tasks.
The resulting activation functions, called Exponential Error Linear Unit (EELU), are evaluated on various network architectures and datasets.

Plain English Explanation

The effectiveness of a deep learning model [like a neural network] depends heavily on the activation function it uses. An activation function is a mathematical operation that determines how a neuron in the network responds to its inputs. Researchers have manually designed various activation functions over the years, trying to find ones that work well.

However, this manual design process can be time-consuming and limited. In this study, the researchers tried a different approach - using an evolutionary algorithm to automatically optimize activation functions for image classification tasks. [An evolutionary algorithm is a problem-solving technique inspired by natural selection and evolution.]

Through this optimization process, the researchers developed a new family of activation functions called Exponential Error Linear Unit (EELU). They evaluated these EELU functions on a variety of common image classification neural network models and datasets, and found that they often outperformed the existing standard activation functions.

The key idea is that by using an automated optimization approach, the researchers were able to discover activation functions that are better suited for image classification problems than the ones that were manually designed. This shows the potential of using advanced techniques like evolutionary algorithms to improve the core components of deep learning models.

Technical Explanation

The researchers proposed an evolutionary optimization framework to discover high-performing activation functions for image classification tasks. They evaluated the resulting activation functions, called Exponential Error Linear Unit (EELU), on five state-of-the-art neural network architectures (ResNet50, AlexNet, VGG16, MobileNet, and Compact Convolutional Transformer) and eight standard image classification datasets (CIFAR10, Imagenette, MNIST, Fashion MNIST, Beans, Colorectal Histology, CottonWeedID15, and TinyImageNet).

Through statistical analysis, the researchers found that the activation functions generated by their optimization scheme outperformed existing standard activation functions in 92.8% of the cases studied. The best-performing activation function discovered was $-xcdot erf(e^{-x})$, which the researchers recommend for image classification tasks.

Critical Analysis

The researchers acknowledge that their study is focused on image classification tasks, and the performance of the EELU activation functions on other types of deep learning problems (e.g. natural language processing, time series analysis) remains to be evaluated. Additionally, the computational cost of the evolutionary optimization process is not discussed, which could be a practical limitation for some applications.

While the results demonstrate the potential of automated activation function optimization, further research is needed to understand the underlying reasons why certain functions perform better than others. Exploring the mathematical properties and the inductive biases of the discovered activation functions could provide insights into their effectiveness.

Additionally, the study does not compare the EELU functions to more recently proposed activation functions that also use automated search techniques, such as Swish. Expanding the benchmarking to include a wider range of state-of-the-art activation functions would provide a more comprehensive evaluation.

Conclusion

This study presents a promising evolutionary approach for optimizing activation functions for image classification tasks. The resulting Exponential Error Linear Unit (EELU) activation functions were found to outperform existing standard activation functions in a large majority of the cases evaluated.

The ability to automatically discover highly effective activation functions has the potential to accelerate the development of high-performing deep learning models, especially for specialized domains like computer vision. Further research is needed to explore the broader applicability of this optimization scheme and the underlying characteristics of the discovered activation functions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

Activation Function Optimization Scheme for Image Classification

Abdur Rahman, Lu He, Haifeng Wang

Activation function has a significant impact on the dynamics, convergence, and performance of deep neural networks. The search for a consistent and high-performing activation function has always been a pursuit during deep learning model development. Existing state-of-the-art activation functions are manually designed with human expertise except for Swish. Swish was developed using a reinforcement learning-based search strategy. In this study, we propose an evolutionary approach for optimizing activation functions specifically for image classification tasks, aiming to discover functions that outperform current state-of-the-art options. Through this optimization framework, we obtain a series of high-performing activation functions denoted as Exponential Error Linear Unit (EELU). The developed activation functions are evaluated for image classification tasks from two perspectives: (1) five state-of-the-art neural network architectures, such as ResNet50, AlexNet, VGG16, MobileNet, and Compact Convolutional Transformer which cover computationally heavy to light neural networks, and (2) eight standard datasets, including CIFAR10, Imagenette, MNIST, Fashion MNIST, Beans, Colorectal Histology, CottonWeedID15, and TinyImageNet which cover from typical machine vision benchmark, agricultural image applications to medical image applications. Finally, we statistically investigate the generalization of the resultant activation functions developed through the optimization scheme. With a Friedman test, we conclude that the optimization scheme is able to generate activation functions that outperform the existing standard ones in 92.8% cases among 28 different cases studied, and $-xcdot erf(e^{-x})$ is found to be the best activation function for image classification generated by the optimization scheme.

9/10/2024

🔄

A Method on Searching Better Activation Functions

Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan, Yongzhe Chang, Xueqian Wang

The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications.

5/24/2024

🛠️

Activation function optimization method: Learnable series linear units (LSLUs)

Chuan Feng, Xi Lin, Shiping Zhu, Hongkang Shi, Maojie Tang, Hua Huang

Effective activation functions introduce non-linear transformations, providing neural networks with stronger fitting capa-bilities, which help them better adapt to real data distributions. Huawei Noah's Lab believes that dynamic activation functions are more suitable than static activation functions for enhancing the non-linear capabilities of neural networks. Tsinghua University's related research also suggests using dynamically adjusted activation functions. Building on the ideas of using fine-tuned activation functions from Tsinghua University and Huawei Noah's Lab, we propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units). This method simplifies deep learning networks while im-proving accuracy. This method introduces learnable parameters {theta} and {omega} to control the activation function, adapting it to the current layer's training stage and improving the model's generalization. The principle is to increase non-linearity in each activation layer, boosting the network's overall non-linearity. We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm), validating its effectiveness. The convergence behavior of the learnable parameters {theta} and {omega}, as well as their effects on generalization, are analyzed. Our empirical results show that LSLU enhances the general-ization ability of the original model in various tasks while speeding up training. In VanillaNet training, parameter {theta} initially decreases, then increases before stabilizing, while {omega} shows an opposite trend. Ultimately, LSLU achieves a 3.17% accuracy improvement on CIFAR100 for VanillaNet (Table 3). Codes are available at https://github.com/vontran2021/Learnable-series-linear-units-LSLU.

9/16/2024

Efficient Search for Customized Activation Functions with Gradient Descent

Lukas Strack, Mahmoud Safari, Frank Hutter

Different activation functions work best for different deep learning models. To exploit this, we leverage recent advancements in gradient-based search techniques for neural architectures to efficiently identify high-performing activation functions for a given application. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions, allowing for the exploration of novel activations. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried, from image classification to language models. Moreover, the identified activations exhibit strong transferability to larger models of the same type, as well as new datasets. Importantly, our automated process for creating customized activation functions is orders of magnitude more efficient than previous approaches. It can easily be applied on top of arbitrary deep learning pipelines and thus offers a promising practical avenue for enhancing deep learning architectures.

8/14/2024