Fast Genetic Algorithm for feature selection -- A qualitative approximation approach

2404.03996

Published 4/8/2024 by Mohammed Ghaith Altarabichi, S{l}awomir Nowaczyk, Sepideh Pashami, Peyman Sheikholharam Mashhadi

🔍

Abstract

Evolutionary Algorithms (EAs) are often challenging to apply in real-world settings since evolutionary computations involve a large number of evaluations of a typically expensive fitness function. For example, an evaluation could involve training a new machine learning model. An approximation (also known as meta-model or a surrogate) of the true function can be used in such applications to alleviate the computation cost. In this paper, we propose a two-stage surrogate-assisted evolutionary approach to address the computational issues arising from using Genetic Algorithm (GA) for feature selection in a wrapper setting for large datasets. We define 'Approximation Usefulness' to capture the necessary conditions to ensure correctness of the EA computations when an approximation is used. Based on this definition, we propose a procedure to construct a lightweight qualitative meta-model by the active selection of data instances. We then use a meta-model to carry out the feature selection task. We apply this procedure to the GA-based algorithm CHC (Cross generational elitist selection, Heterogeneous recombination and Cataclysmic mutation) to create a Qualitative approXimations variant, CHCQX. We show that CHCQX converges faster to feature subset solutions of significantly higher accuracy (as compared to CHC), particularly for large datasets with over 100K instances. We also demonstrate the applicability of the thinking behind our approach more broadly to Swarm Intelligence (SI), another branch of the Evolutionary Computation (EC) paradigm with results of PSOQX, a qualitative approximation adaptation of the Particle Swarm Optimization (PSO) method. A GitHub repository with the complete implementation is available.

Create account to get full access

Overview

Evolutionary Algorithms (EAs) are often computationally expensive due to the need to evaluate a large number of fitness functions
This paper proposes a two-stage surrogate-assisted evolutionary approach to address the high computational cost of using Genetic Algorithms (GAs) for feature selection
The approach involves defining "Approximation Usefulness" to ensure the correctness of EA computations when using an approximation, and constructing a lightweight qualitative meta-model through active selection of data instances
The paper applies this approach to the GA-based CHC algorithm, creating CHCQX, which is shown to converge faster and find more accurate feature subset solutions, especially for large datasets
The thinking behind this approach is also demonstrated to be applicable to Swarm Intelligence (SI) algorithms, with results shown for a qualitative approximation adaptation of Particle Swarm Optimization (PSO)

Plain English Explanation

Evolutionary Algorithms (EAs) are a powerful set of techniques inspired by the process of natural selection. They're used to optimize complex problems by iteratively "evolving" potential solutions, much like how species evolve in the natural world. However, EAs can be computationally expensive because they require evaluating the "fitness" of a large number of potential solutions, which can be a time-consuming process.

For example, imagine you're trying to train a new machine learning model as part of an EA. Each time you evaluate a potential solution, you'd need to train the model and measure its performance, which could take a long time, especially for large datasets.

To address this issue, the researchers in this paper propose using an "approximation" or "surrogate" model to estimate the fitness of potential solutions, rather than having to do the full, expensive evaluation. They define a concept called "Approximation Usefulness" to help ensure that this approximation doesn't introduce errors into the EA process.

The key idea is to construct a lightweight, qualitative meta-model by actively selecting a subset of the data instances to use. This meta-model can then be used to guide the EA's search for optimal solutions, rather than having to do the full, expensive evaluations.

The researchers apply this approach to a Genetic Algorithm called CHC, creating a new version called CHCQX. They show that CHCQX is able to converge to high-quality solutions much faster than the original CHC, especially for large datasets with over 100,000 instances.

The researchers also demonstrate that the thinking behind this approach can be applied to other types of Evolutionary Computation algorithms, like Swarm Intelligence methods. They show results for a qualitative approximation adaptation of Particle Swarm Optimization (PSO), called PSOQX.

Overall, this research presents a promising approach for making Evolutionary Algorithms more computationally efficient, which could unlock their use in a wider range of real-world applications, such as materialized view optimization, target set selection, and quantization-aware neural network approximation.

Technical Explanation

The key technical elements of this paper are:

Approximation Usefulness: The researchers define a concept called "Approximation Usefulness" to capture the necessary conditions for ensuring the correctness of EA computations when using an approximation (or surrogate) of the true fitness function. This provides a framework for reasoning about the suitability of approximations in EAs.
Lightweight Qualitative Meta-model: The researchers propose a procedure to construct a lightweight, qualitative meta-model by actively selecting a subset of data instances. This meta-model is then used as an approximation of the true fitness function within the EA.
CHCQX Algorithm: The researchers apply their surrogate-assisted approach to the Genetic Algorithm called CHC, creating a new variant called CHCQX. CHCQX uses the qualitative meta-model to guide the search for optimal feature subset solutions, leading to faster convergence and higher accuracy, especially for large datasets.
PSOQX Algorithm: To demonstrate the broader applicability of their approach, the researchers also create a qualitative approximation adaptation of the Particle Swarm Optimization (PSO) algorithm, called PSOQX.

The key experimental results show that CHCQX is able to converge to feature subset solutions of significantly higher accuracy compared to the original CHC algorithm, particularly for large datasets with over 100,000 instances. The researchers also provide results demonstrating the effectiveness of PSOQX, the Swarm Intelligence adaptation of their approach.

Critical Analysis

The researchers have presented a compelling approach for addressing the computational challenges of using Evolutionary Algorithms in real-world settings. By leveraging lightweight, qualitative meta-models as approximations of the true fitness function, they are able to achieve significant speedups in convergence while maintaining high-quality solutions.

One potential limitation of the approach is the need to carefully define the "Approximation Usefulness" conditions to ensure the correctness of the EA computations. While the researchers provide a framework for this, the specific conditions may need to be tailored to the problem domain and the characteristics of the approximation being used.

Additionally, the performance of the meta-model construction process may be sensitive to the specific dataset and problem characteristics. The researchers have demonstrated the effectiveness of their approach on several datasets, but further research may be needed to understand its broader applicability and any potential limitations.

It would also be interesting to see how the proposed surrogate-assisted approach compares to other techniques for improving the efficiency of Evolutionary Algorithms, such as gradient-enhanced Gaussian process surrogates or machine learning-enhanced multi-objective optimization.

Overall, this research represents an important contribution to the field of Evolutionary Computation, providing a practical and effective approach for making these powerful optimization techniques more accessible and applicable to real-world problems.

Conclusion

This paper presents a two-stage surrogate-assisted evolutionary approach that addresses the computational challenges of using Genetic Algorithms (GAs) for feature selection in large datasets. By defining "Approximation Usefulness" and constructing a lightweight qualitative meta-model, the researchers are able to create a more efficient variant of the GA-based CHC algorithm, called CHCQX, that converges faster and finds more accurate feature subset solutions.

The key innovation of this work is the use of a qualitative meta-model as an approximation of the true fitness function, which allows the EA to explore the search space more efficiently. The researchers also demonstrate the broader applicability of this thinking by adapting it to a Swarm Intelligence algorithm, Particle Swarm Optimization (PSO), creating PSOQX.

This research has important implications for making Evolutionary Algorithms more practical and usable in real-world settings, where computational efficiency is a critical concern. By addressing these challenges, the proposed approach could unlock the use of EAs in a wider range of applications, such as materialized view optimization, target set selection, and quantization-aware neural network approximation. Overall, this work represents an important step forward in the field of Evolutionary Computation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛸

Fitness Approximation through Machine Learning

Itai Tzruia, Tomer Halperin, Moshe Sipper, Achiya Elyasaf

We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, through dynamic adaptation to the evolutionary state. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update a fitness-approximation ML model throughout an evolutionary run. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Although we focus on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly -- our approach is generic and can be easily applied to many different domains.

5/22/2024

cs.NE cs.AI cs.LG

🌐

A Genetic Algorithm-Based Support Vector Machine Approach for Intelligent Usability Assessment of m-Learning Applications

Muhammad Asghar, Imran Sarwar Bajwa, Shabana Ramzan, Hina Afreen, Saima Abdullah

In the field of human-computer interaction (HCI), the usability assessment of m-learning (mobile-learning) applications is a real challenge. Such assessment typically involves extraction of the best features of an application like efficiency, effectiveness, learnability, cognition, memorability, etc., and further ranking of those features for an overall assessment of the quality of the mobile application. In the previous literature, it is found that there is neither any theory nor any tool available to measure or assess a user perception and assessment of usability features of a m-learning application for the sake of ranking the graphical user interface of a mobile application in terms of a user acceptance and satisfaction. In this paper, a novel approach is presented by performing a mobile applications quantitative and qualitative analysis. Based on user requirements and perception, a criterion is defined based on a set of important features. Afterward, for the qualitative analysis, a genetic algorithm (GA) is used to score prescribed features for the usability assessment of a mobile application. The used approach assigns a score to each usability feature according to the user requirement and weight of each feature. GA performs the rank assessment process initially by performing feature selection and scoring the best features of the application. A comparison of assessment analysis of GA and various machine learning models, K-nearest neighbours, Naive Bayes, and Random Forests is performed. It was found that a GA-based support vector machine (SVM) provides more accuracy in the extraction of the best features of a mobile application and further ranking of those features.

4/26/2024

cs.HC

Sharpness-Aware Minimization for Evolutionary Feature Construction in Regression

Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang

In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.

5/14/2024

cs.LG cs.NE

NeuroLGP-SM: Scalable Surrogate-Assisted Neuroevolution for Deep Neural Networks

Fergal Stapleton, Edgar Galv'an

Evolutionary Algorithms (EAs) play a crucial role in the architectural configuration and training of Artificial Deep Neural Networks (DNNs), a process known as neuroevolution. However, neuroevolution is hindered by its inherent computational expense, requiring multiple generations, a large population, and numerous epochs. The most computationally intensive aspect lies in evaluating the fitness function of a single candidate solution. To address this challenge, we employ Surrogate-assisted EAs (SAEAs). While a few SAEAs approaches have been proposed in neuroevolution, none have been applied to truly large DNNs due to issues like intractable information usage. In this work, drawing inspiration from Genetic Programming semantics, we use phenotypic distance vectors, outputted from DNNs, alongside Kriging Partial Least Squares (KPLS), an approach that is effective in handling these large vectors, making them suitable for search. Our proposed approach, named Neuro-Linear Genetic Programming surrogate model (NeuroLGP-SM), efficiently and accurately estimates DNN fitness without the need for complete evaluations. NeuroLGP-SM demonstrates competitive or superior results compared to 12 other methods, including NeuroLGP without SM, convolutional neural networks, support vector machines, and autoencoders. Additionally, it is worth noting that NeuroLGP-SM is 25% more energy-efficient than its NeuroLGP counterpart. This efficiency advantage adds to the overall appeal of our proposed NeuroLGP-SM in optimising the configuration of large DNNs.

5/3/2024

cs.NE cs.AI