Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/1360
Title: RILS-ROLS: robust symbolic regression via iterated local search and ordinary least squares
Authors: Kartelj, Aleksandar 
Djukanović, Marko
Affiliations: Informatics and Computer Science 
Keywords: Ground-truth benchmark sets;Iterated local search;Ordinary least squares;Symbolic regression
Issue Date: 1-Dec-2023
Rank: M21a
Publisher: Springer
Journal: Journal of Big Data
Abstract: 
In this paper, we solve the well-known symbolic regression problem that has been intensively studied and has a wide range of applications. To solve it, we propose an efficient metaheuristic-based approach, called RILS-ROLS. RILS-ROLS is based on the following two elements: (i) iterated local search, which is the method backbone, mainly solving combinatorial and some continuous aspects of the problem; (ii) ordinary least squares method, which focuses on the continuous aspect of the search space—it efficiently determines the best—fitting coefficients of linear combinations within solution equations. In addition, we introduce a novel fitness function that combines important model quality measures: R2 score, RMSE score, size of the model (or model complexity), and carefully designed local search, which allows systematic search in proximity to candidate solution. Experiments are conducted on the two well-known ground-truth benchmark sets from literature: Feynman and Strogatz. RILS-ROLS was compared to 14 other competitors from the literature. Our method outperformed all 14 competitors with respect to the symbolic solution rate under varying levels of noise. We observed the robustness of the method with respect to noise, as the symbolic solution rate decreases relatively slowly with increasing noise. Statistical analysis of the obtained experimental results confirmed that RILS-ROLS is a new state-of-the-art method for solving the problem of symbolic regression on datasets whose target variable is modelled as a closed-form equation with allowed operators. In addition to evaluation on known ground-truth datasets, we introduced a new randomly generated set of problem instances. The goal of this set of instances was to test the sensitivity of our method with respect to incremental equation sizes under different levels of noise. We have also proposed a parallelized extension of RILS-ROLS that has proven adequate in solving several very large instances with 1 million records and up to 15 input variables.
Description: 
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Springer
DOI: 10.1186/s40537-023-00743-2
URI: https://research.matf.bg.ac.rs/handle/123456789/1360
DOI: 10.1186/s40537-023-00743-2
Rights: Attribution 3.0 United States
Appears in Collections:Research outputs

Files in This Item:
File Description SizeFormat
s40537-023-00743-2.pdf2.81 MBAdobe PDF
View/Open
Show full item record

SCOPUSTM   
Citations

3
checked on Dec 20, 2024

Page view(s)

16
checked on Dec 25, 2024

Download(s)

1
checked on Dec 25, 2024

Google ScholarTM

Check

Altmetric

Altmetric


This item is licensed under a Creative Commons License Creative Commons