Multi-View Symbolic Regression

Read original: arXiv:2402.04298 - Published 7/22/2024 by Etienne Russeil, Fabr'icio Olivetti de Franc{c}a, Konstantin Malanchev, Bogdan Burlacu, Emille E. O. Ishida, Marion Leroux, Cl'ement Michelin, Guillaume Moinard, Emmanuel Gangler

Overview

This paper proposes a "multi-view" approach to symbolic regression, which aims to improve the performance and robustness of symbolic regression models.
Symbolic regression is a machine learning technique that automatically discovers mathematical equations to fit observed data, without requiring pre-defined functional forms.
The multi-view approach combines multiple symbolic regression models, each trained on different feature representations of the same problem, to produce a more accurate and reliable final model.

Plain English Explanation

The paper introduces a new way to do symbolic regression, which is a type of machine learning that tries to find mathematical equations to fit a dataset. Instead of using just one model, the researchers use multiple models, each looking at the data in a slightly different way.

By combining these different "views" of the data, the final model becomes more accurate and robust than a single model would be. This is similar to how humans often look at a problem from multiple angles to get a better understanding. The key idea is that the different models can compensate for each other's weaknesses and produce a more reliable overall result.

Technical Explanation

The paper presents a "multi-view symbolic regression" approach, where multiple symbolic regression models are trained on different feature representations of the same problem. These models are then combined to produce a final, more accurate and robust solution.

Specifically, the authors train several genetic programming-based symbolic regression models, each using a different set of input features or "views" of the data. These views could include things like raw sensor measurements, derived features, or even embeddings from a large language model.

The individual models are then aggregated using an ensemble technique, allowing the strengths of each view to contribute to the final result. The authors show that this multi-view approach outperforms single-view symbolic regression on a variety of benchmark problems, demonstrating improved performance and stability.

Critical Analysis

The multi-view symbolic regression approach proposed in this paper is a promising technique for enhancing the capabilities of symbolic regression. By leveraging diverse feature representations, the method can potentially capture more complex relationships in the data and produce more accurate models.

However, the paper does not extensively explore the limits of this approach. For example, it is unclear how the method would scale to very high-dimensional problems or how sensitive it is to the choice of input features and model architectures. Additionally, the authors do not provide much insight into the interpretability of the final multi-view models, which is a key strength of symbolic regression.

Further research could investigate the robustness of this approach to noisy or incomplete data, as well as its ability to generalize to new problem domains beyond the benchmarks studied here. Integrating the multi-view technique with other enhancements to symbolic regression, such as neural guidance, could also be a fruitful area of exploration.

Conclusion

This paper presents a novel "multi-view" approach to symbolic regression, which combines the strengths of multiple symbolic regression models trained on different feature representations of the same problem. The authors demonstrate that this technique can improve the performance and robustness of symbolic regression compared to single-view methods.

The multi-view approach is a promising direction for advancing the state-of-the-art in symbolic regression, potentially allowing for the discovery of more accurate and interpretable mathematical models from data. While further research is needed to fully understand the capabilities and limitations of this approach, this work represents an important step forward in the field of automated equation discovery.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Multi-View Symbolic Regression

Etienne Russeil, Fabr'icio Olivetti de Franc{c}a, Konstantin Malanchev, Bogdan Burlacu, Emille E. O. Ishida, Marion Leroux, Cl'ement Michelin, Guillaume Moinard, Emmanuel Gangler

Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.

7/22/2024

Class Symbolic Regression: Gotta Fit 'Em All

Wassim Tenachi, Rodrigo Ibata, Thibaut L. Franc{c}ois, Foivos I. Diakogiannis

We introduce 'Class Symbolic Regression' (Class SR) a first framework for automatically finding a single analytical functional form that accurately fits multiple datasets - each realization being governed by its own (possibly) unique set of fitting parameters. This hierarchical framework leverages the common constraint that all the members of a single class of physical phenomena follow a common governing law. Our approach extends the capabilities of our earlier Physical Symbolic Optimization ($Phi$-SO) framework for Symbolic Regression, which integrates dimensional analysis constraints and deep reinforcement learning for unsupervised symbolic analytical function discovery from data. Additionally, we introduce the first Class SR benchmark, comprising a series of synthetic physical challenges specifically designed to evaluate such algorithms. We demonstrate the efficacy of our novel approach by applying it to these benchmark challenges and showcase its practical utility for astrophysics by successfully extracting an analytic galaxy potential from a set of simulated orbits approximating stellar streams.

6/19/2024

In-Context Symbolic Regression: Leveraging Language Models for Function Discovery

Matteo Merler, Katsiaryna Haitsiukevich, Nicola Dainese, Pekka Marttinen

State of the art Symbolic Regression (SR) methods currently build specialized models, while the application of Large Language Models (LLMs) remains largely unexplored. In this work, we introduce the first comprehensive framework that utilizes LLMs for the task of SR. We propose In-Context Symbolic Regression (ICSR), an SR method which iteratively refines a functional form with an LLM and determines its coefficients with an external optimizer. ICSR leverages LLMs' strong mathematical prior both to propose an initial set of possible functions given the observations and to refine them based on their errors. Our findings reveal that LLMs are able to successfully find symbolic equations that fit the given data, matching or outperforming the overall performance of the best SR baselines on four popular benchmarks, while yielding simpler equations with better out of distribution generalization.

7/18/2024

🧠

Scalable Neural Symbolic Regression using Control Variables

Xieting Chu, Hongjue Zhao, Enze Xu, Hairong Qi, Minghan Chen, Huajie Shao

Symbolic regression (SR) is a powerful technique for discovering the analytical mathematical expression from data, finding various applications in natural sciences due to its good interpretability of results. However, existing methods face scalability issues when dealing with complex equations involving multiple variables. To address this challenge, we propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability. The core idea is to decompose multi-variable symbolic regression into a set of single-variable SR problems, which are then combined in a bottom-up manner. The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs). Second, the data generator is used to generate samples for a certain variable by controlling the input variables. Thirdly, single-variable symbolic regression is applied to estimate the corresponding mathematical expression. Lastly, we repeat steps 2 and 3 by gradually adding variables one by one until completion. We evaluate the performance of our method on multiple benchmark datasets. Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables. Moreover, it can substantially reduce the search space for symbolic regression. The source code will be made publicly available upon publication.

7/11/2024