An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters

2303.12797

Published 5/15/2024 by Julie Keisler (EDF R&D OSIRIS, EDF R&D, CRIStAL, CRIStAL), El-Ghazali Talbi (CRIStAL, CRIStAL), Sandra Claudel (EDF R&D OSIRIS, EDF R&D), Gilles Cabriel (EDF R&D OSIRIS, EDF R&D)

cs.NE cs.AI cs.LG

🛠️

Abstract

In this paper, we propose an algorithmic framework to automatically generate efficient deep neural networks and optimize their associated hyperparameters. The framework is based on evolving directed acyclic graphs (DAGs), defining a more flexible search space than the existing ones in the literature. It allows mixtures of different classical operations: convolutions, recurrences and dense layers, but also more newfangled operations such as self-attention. Based on this search space we propose neighbourhood and evolution search operators to optimize both the architecture and hyper-parameters of our networks. These search operators can be used with any metaheuristic capable of handling mixed search spaces. We tested our algorithmic framework with an evolutionary algorithm on a time series prediction benchmark. The results demonstrate that our framework was able to find models outperforming the established baseline on numerous datasets.

Create account to get full access

Overview

The paper proposes an algorithmic framework to automatically generate efficient deep neural networks and optimize their associated hyperparameters.
The framework is based on evolving directed acyclic graphs (DAGs), which define a more flexible search space than existing approaches.
It allows for a mixture of different operations, including convolutions, recurrences, dense layers, and self-attention.
The framework uses neighborhood and evolution search operators to optimize both the architecture and hyperparameters of the networks.
The authors tested their framework on a time series prediction benchmark and found that it was able to outperform established baselines on numerous datasets.

Plain English Explanation

The paper describes a method for automatically designing and optimizing deep neural networks. Rather than relying on human experts to manually configure the network architecture and hyperparameters, the researchers developed an algorithmic framework that can do this automatically.

The key innovation is the use of directed acyclic graphs (DAGs) to represent the network structure. This allows for a more flexible and diverse set of network designs, including not just the traditional layers like convolutions and dense layers, but also more advanced operations like self-attention.

The framework then uses optimization techniques, like evolutionary algorithms, to explore this space of possible network designs and find the ones that perform the best on a given task. This includes not just the network architecture, but also the hyperparameters that control how the network trains and operates.

By automating this process, the researchers were able to find network designs that outperformed established baselines on a time series prediction task. This could be a powerful tool for rapidly developing high-performing neural networks without the need for extensive manual tuning.

Technical Explanation

The paper proposes an algorithmic framework for automatically generating and optimizing deep neural networks. The core of the framework is the use of directed acyclic graphs (DAGs) to represent the network architecture. This allows for a more flexible search space than existing approaches, as the DAGs can encode a wide variety of network structures, including not just traditional layers like convolutions and dense layers, but also more advanced operations like self-attention.

To optimize both the network architecture and the associated hyperparameters, the researchers developed neighborhood and evolution search operators that can be used with any metaheuristic optimization algorithm. They tested their framework using an evolutionary algorithm on a time series prediction benchmark, and found that it was able to outperform established baselines on numerous datasets.

Critical Analysis

The paper presents a promising approach for automated neural network design and optimization, but it also has some limitations. The authors only evaluated their framework on a single benchmark task, so it's unclear how well it would generalize to other types of problems. Additionally, the computational cost of the evolutionary optimization process may be prohibitive for some real-world applications.

It would be interesting to see the framework applied to a wider range of tasks and compared to other automated machine learning approaches. The authors also do not discuss the interpretability or explainability of the generated networks, which could be an important consideration in some domains.

Overall, the paper demonstrates the potential of using evolutionary optimization and flexible network representations to automate the design of high-performing deep neural networks. Further research and development in this area could lead to significant advancements in the field of automated machine learning.

Conclusion

The proposed algorithmic framework offers a novel approach to automatically generating and optimizing deep neural networks. By using directed acyclic graphs to represent the network architecture and evolutionary optimization techniques to explore the search space, the researchers were able to find models that outperformed established baselines on a time series prediction task.

This work demonstrates the potential of automated machine learning techniques to streamline the development of high-performing neural networks, without the need for extensive manual tuning by human experts. Further research and development in this area could lead to significant advancements in the field of deep learning and its application to a wide range of real-world problems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛠️

Scalable Nested Optimization for Deep Learning

Jonathan Lorraine

Gradient-based optimization has been critical to the success of machine learning, updating a single set of parameters to minimize a single loss. A growing number of applications rely on a generalization of this, where we have a bilevel or nested optimization of which subsets of parameters update on different objectives nested inside each other. We focus on motivating examples of hyperparameter optimization and generative adversarial networks. However, naively applying classical methods often fails when we look at solving these nested problems on a large scale. In this thesis, we build tools for nested optimization that scale to deep learning setups.

7/2/2024

cs.LG cs.AI cs.NE stat.ML

🧠

Unleash Graph Neural Networks from Heavy Tuning

Lequan Lin, Dai Shi, Andi Han, Zhiyong Wang, Junbin Gao

Graph Neural Networks (GNNs) are deep-learning architectures designed for graph-type data, where understanding relationships among individual observations is crucial. However, achieving promising GNN performance, especially on unseen data, requires comprehensive hyperparameter tuning and meticulous training. Unfortunately, these processes come with high computational costs and significant human effort. Additionally, conventional searching algorithms such as grid search may result in overfitting on validation data, diminishing generalization accuracy. To tackle these challenges, we propose a graph conditional latent diffusion framework (GNN-Diff) to generate high-performing GNNs directly by learning from checkpoints saved during a light-tuning coarse search. Our method: (1) unleashes GNN training from heavy tuning and complex search space design; (2) produces GNN parameters that outperform those obtained through comprehensive grid search; and (3) establishes higher-quality generation for GNNs compared to diffusion frameworks designed for general neural networks.

5/22/2024

cs.LG

🧠

An Intelligent End-to-End Neural Architecture Search Framework for Electricity Forecasting Model Development

Jin Yang, Guangxin Jiang, Yinan Wang, Ying Chen

Recent years have witnessed exponential growth in developing deep learning (DL) models for time-series electricity forecasting in power systems. However, most of the proposed models are designed based on the designers' inherent knowledge and experience without elaborating on the suitability of the proposed neural architectures. Moreover, these models cannot be self-adjusted to dynamically changed data patterns due to the inflexible design of their structures. Although several recent studies have considered the application of the neural architecture search (NAS) technique for obtaining a network with an optimized structure in the electricity forecasting sector, their training process is computationally expensive and their search strategies are not flexible, indicating that the NAS application in this area is still at an infancy stage. In this study, we propose an intelligent automated architecture search (IAAS) framework for the development of time-series electricity forecasting models. The proposed framework contains three primary components, i.e., network function-preserving transformation operation, reinforcement learning (RL)-based network transformation control, and heuristic network screening, which aim to improve the search quality of a network structure. After conducting comprehensive experiments on two publicly-available electricity load datasets and two wind power datasets, we demonstrate that the proposed IAAS framework significantly outperforms the ten existing models or methods in terms of forecasting accuracy and stability. Finally, we perform an ablation experiment to showcase the importance of critical components in the proposed IAAS framework in improving forecasting accuracy.

6/4/2024

cs.LG

Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution

Brandon Morgan, Dean Hougen

A major contributor to the quality of a deep learning model is the selection of the optimizer. We propose a new dual-joint search space in the realm of neural optimizer search (NOS), along with an integrity check, to automate the process of finding deep learning optimizers. Our dual-joint search space simultaneously allows for the optimization of not only the update equation, but also internal decay functions and learning rate schedules for optimizers. We search the space using our proposed mutation-only, particle-based genetic algorithm able to be massively parallelized for our domain-specific problem. We evaluate our candidate optimizers on the CIFAR-10 dataset using a small ConvNet. To assess generalization, the final optimizers were then transferred to large-scale image classification on CIFAR- 100 and TinyImageNet, while also being fine-tuned on Flowers102, Cars196, and Caltech101 using EfficientNetV2Small. We found multiple optimizers, learning rate schedules, and Adam variants that outperformed Adam, as well as other standard deep learning optimizers, across the image classification tasks.

4/11/2024

cs.NE cs.AI