Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/1360
Title: | RILS-ROLS: robust symbolic regression via iterated local search and ordinary least squares | Authors: | Kartelj, Aleksandar Djukanović, Marko |
Affiliations: | Informatics and Computer Science | Keywords: | Ground-truth benchmark sets;Iterated local search;Ordinary least squares;Symbolic regression | Issue Date: | 1-Dec-2023 | Rank: | M21a | Publisher: | Springer | Journal: | Journal of Big Data | Abstract: | In this paper, we solve the well-known symbolic regression problem that has been intensively studied and has a wide range of applications. To solve it, we propose an efficient metaheuristic-based approach, called RILS-ROLS. RILS-ROLS is based on the following two elements: (i) iterated local search, which is the method backbone, mainly solving combinatorial and some continuous aspects of the problem; (ii) ordinary least squares method, which focuses on the continuous aspect of the search space—it efficiently determines the best—fitting coefficients of linear combinations within solution equations. In addition, we introduce a novel fitness function that combines important model quality measures: R2 score, RMSE score, size of the model (or model complexity), and carefully designed local search, which allows systematic search in proximity to candidate solution. Experiments are conducted on the two well-known ground-truth benchmark sets from literature: Feynman and Strogatz. RILS-ROLS was compared to 14 other competitors from the literature. Our method outperformed all 14 competitors with respect to the symbolic solution rate under varying levels of noise. We observed the robustness of the method with respect to noise, as the symbolic solution rate decreases relatively slowly with increasing noise. Statistical analysis of the obtained experimental results confirmed that RILS-ROLS is a new state-of-the-art method for solving the problem of symbolic regression on datasets whose target variable is modelled as a closed-form equation with allowed operators. In addition to evaluation on known ground-truth datasets, we introduced a new randomly generated set of problem instances. The goal of this set of instances was to test the sensitivity of our method with respect to incremental equation sizes under different levels of noise. We have also proposed a parallelized extension of RILS-ROLS that has proven adequate in solving several very large instances with 1 million records and up to 15 input variables. |
Description: | Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Springer DOI: 10.1186/s40537-023-00743-2 |
URI: | https://research.matf.bg.ac.rs/handle/123456789/1360 | DOI: | 10.1186/s40537-023-00743-2 | Rights: | Attribution 3.0 United States |
Appears in Collections: | Research outputs |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
s40537-023-00743-2.pdf | 2.81 MB | Adobe PDF | View/Open |
SCOPUSTM
Citations
3
checked on Dec 20, 2024
Page view(s)
16
checked on Dec 25, 2024
Download(s)
1
checked on Dec 25, 2024
Google ScholarTM
Check
Altmetric
Altmetric
This item is licensed under a Creative Commons License