Unit-Aware Genetic Programming for the Development of Empirical Equations

Read original: arXiv:2405.18896 - Published 5/30/2024 by Julia Reuter, Viktor Martinek, Roland Herzog, Sanaz Mostaghim

Unit-Aware Genetic Programming for the Development of Empirical Equations

Overview

This paper explores the use of Unit-Aware Genetic Programming (UAGP) for developing empirical equations that are consistent with physical units.
The key idea is to incorporate unit awareness into the genetic programming process to ensure the resulting equations maintain dimensional consistency.
The authors demonstrate the effectiveness of UAGP on several benchmark physics-related problems and compare it to traditional genetic programming approaches.

Plain English Explanation

Equations in science and engineering often need to be consistent with the physical units of the variables involved, like meters, seconds, or kilograms. Unit-Aware Genetic Programming (UAGP) is a technique that helps ensure the equations produced by a genetic programming process maintain this dimensional consistency.

Genetic programming is a way of automatically discovering mathematical equations that fit a set of data. However, the equations it generates don't always make physical sense, since the process doesn't inherently account for the units of the variables. UAGP addresses this by building in an awareness of units from the start. This allows it to explore equation forms that are dimensionally valid, rather than wasting time on ones that don't make sense physically.

The authors of this paper applied UAGP to several benchmark physics-related problems, like predicting the drag force on a sphere or the natural frequency of a mass-spring-damper system. They found that UAGP consistently outperformed traditional genetic programming approaches, producing more accurate and physically meaningful equations. This suggests UAGP could be a valuable tool for researchers and engineers who need to develop empirical models from data while respecting the underlying physical principles.

Technical Explanation

The key innovation in this paper is the incorporation of unit awareness into the genetic programming process through the use of Unit-Aware Genetic Programming (UAGP). This involves modifying the standard genetic programming algorithm to ensure the generated equations maintain dimensional consistency.

Specifically, the authors made the following changes:

Unit-Aware Initialization: The initial population of equations is generated such that all variables are combined in dimensionally consistent ways.
Unit-Aware Mutation and Crossover: The genetic operators that modify equations during evolution are designed to preserve unit consistency.
Unit-Aware Fitness Evaluation: The fitness function explicitly considers the dimensional correctness of candidate equations, in addition to their predictive accuracy.

The authors tested UAGP on several benchmark problems from physics and engineering, including predicting the drag force on a sphere, the natural frequency of a mass-spring-damper system, and the heat transfer coefficient for flow over a cylinder. They compared the performance of UAGP to traditional genetic programming approaches, as well as other symbolic regression techniques like Sharpness-Aware Minimization and Multi-Representation Genetic Programming.

The results showed that UAGP consistently outperformed the other methods, producing more accurate and physically meaningful equations. The authors attribute this to UAGP's ability to efficiently explore the space of dimensionally valid equations, rather than wasting time on forms that violate physical constraints.

Critical Analysis

The authors acknowledge several limitations of their approach. First, UAGP relies on having a priori knowledge of the relevant physical units, which may not always be available in practical applications. The authors suggest exploring ways to automatically infer unit information from data, similar to the Causal Unit Selection method.

Additionally, the benchmarks used in the paper are relatively simple, involving a small number of input variables and well-understood physical relationships. It's unclear how well UAGP would scale to more complex problems with higher-dimensional inputs and more intricate underlying physics.

Finally, the authors do not explore the interpretability of the equations generated by UAGP. While the equations may be dimensionally consistent, they may still be overly complex or difficult for humans to understand. Techniques like Fitness Approximation through Machine Learning could potentially be combined with UAGP to produce more interpretable models.

Overall, this paper presents a promising approach for incorporating physical constraints into the genetic programming process, but further research is needed to address its limitations and explore its applicability to a wider range of real-world problems.

Conclusion

This paper introduces Unit-Aware Genetic Programming (UAGP), a technique that enhances standard genetic programming by ensuring the generated equations maintain dimensional consistency with the underlying physical units. Through experiments on several benchmark problems, the authors demonstrate that UAGP consistently outperforms traditional genetic programming and other symbolic regression methods in terms of both accuracy and physical meaningfulness of the resulting equations.

The key insight of UAGP is that by building in an awareness of units from the start of the optimization process, the algorithm can efficiently explore the space of dimensionally valid equations, rather than wasting time on forms that violate physical constraints. This suggests UAGP could be a valuable tool for researchers and engineers who need to develop empirical models from data while respecting the underlying physical principles.

While the current implementation of UAGP has some limitations, the authors provide ideas for future work to address them, such as automatically inferring unit information and improving the interpretability of the generated equations. Overall, this paper represents an important step towards more physically grounded approaches to symbolic regression and equation discovery.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Unit-Aware Genetic Programming for the Development of Empirical Equations

Julia Reuter, Viktor Martinek, Roland Herzog, Sanaz Mostaghim

When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.

5/30/2024

↗️

The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

Gabriel Kronberger, Fabricio Olivetti de Franca, Harry Desmond, Deaglan J. Bartlett, Lukas Kammerer

We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings, allowing exhaustive enumeration of all solutions. This enables us to quantify the success probability of finding the best possible expressions, and to compare the search efficiency of genetic programming to random search in the space of semantically unique expressions. This analysis is made possible by improved algorithms for equality saturation, which we use to improve the Exhaustive Symbolic Regression algorithm; this produces the set of semantically unique expression structures, orders of magnitude smaller than the full symbolic regression search space. We compare the efficiency of random search in the set of unique expressions and genetic programming. For our experiments we use two real-world datasets where symbolic regression has been used to produce well-fitting univariate expressions: the Nikuradse dataset of flow in rough pipes and the Radial Acceleration Relation of galaxy dynamics. The results show that genetic programming in such limited settings explores only a small fraction of all unique expressions, and evaluates expressions repeatedly that are congruent to already visited expressions.

4/29/2024

Towards Gaussian Process for operator learning: an uncertainty aware resolution independent operator learning algorithm for computational mechanics

New!Towards Gaussian Process for operator learning: an uncertainty aware resolution independent operator learning algorithm for computational mechanics

Sawan Kumar, Rajdip Nayek, Souvik Chakraborty

The growing demand for accurate, efficient, and scalable solutions in computational mechanics highlights the need for advanced operator learning algorithms that can efficiently handle large datasets while providing reliable uncertainty quantification. This paper introduces a novel Gaussian Process (GP) based neural operator for solving parametric differential equations. The approach proposed leverages the expressive capability of deterministic neural operators and the uncertainty awareness of conventional GP. In particular, we propose a ``neural operator-embedded kernel'' wherein the GP kernel is formulated in the latent space learned using a neural operator. Further, we exploit a stochastic dual descent (SDD) algorithm for simultaneously training the neural operator parameters and the GP hyperparameters. Our approach addresses the (a) resolution dependence and (b) cubic complexity of traditional GP models, allowing for input-resolution independence and scalability in high-dimensional and non-linear parametric systems, such as those encountered in computational mechanics. We apply our method to a range of non-linear parametric partial differential equations (PDEs) and demonstrate its superiority in both computational efficiency and accuracy compared to standard GP models and wavelet neural operators. Our experimental results highlight the efficacy of this framework in solving complex PDEs while maintaining robustness in uncertainty estimation, positioning it as a scalable and reliable operator-learning algorithm for computational mechanics.

9/18/2024

Discovering Dynamic Symbolic Policies with Genetic Programming

Sigur de Vries, Sander Keemink, Marcel van Gerven

Artificial intelligence techniques are increasingly being applied to solve control problems, but often rely on black-box methods without transparent output generation. To improve the interpretability and transparency in control systems, models can be defined as white-box symbolic policies described by mathematical expressions. While current approaches to learn symbolic policies focus on static policies that directly map observations to control signals, these may fail in partially observable and volatile environments. We instead consider dynamic symbolic policies with memory, optimised with genetic programming. The resulting policies are robust, and consist of easy to interpret coupled differential equations. Our results show that dynamic symbolic policies compare with black-box policies on a variety of control tasks. Furthermore, the benefit of the memory in dynamic policies is demonstrated on experiments where static policies fall short. Overall, we present a method for evolving high-performing symbolic policies that offer interpretability and transparency, which lacks in black-box models.

9/11/2024