Solving the longest common subsequence problem concerning non-uniform distributions of letters in input strings

Nikolic, Bojan; Kartelj, Aleksandar; Djukanovic, Marko; Grbic, Milana; Blum, Christian; Raidl, Günther

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/691

DC Field	Value	Language
dc.contributor.author	Nikolic, Bojan	en_US
dc.contributor.author	Kartelj, Aleksandar	en_US
dc.contributor.author	Djukanovic, Marko	en_US
dc.contributor.author	Grbic, Milana	en_US
dc.contributor.author	Blum, Christian	en_US
dc.contributor.author	Raidl, Günther	en_US
dc.date.accessioned	2022-08-14T10:03:57Z	-
dc.date.available	2022-08-14T10:03:57Z	-
dc.date.issued	2021	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/691	-
dc.description.abstract	The longest common subsequence (LCS) problem is a prominent N P–hard optimization problem where, given an arbitrary set of input strings, the aim is to find a longest subsequence, which is common to all input strings. This problem has a variety of applications in bioinformatics, molecular biology and file plagiarism checking, among others. All previous approaches from the literature are dedicated to solving LCS instances sampled from uniform or near-to-uniform probability distributions of letters in the input strings. In this paper, we introduce an approach that is able to effectively deal with more general cases, where the occurrence of letters in the input strings follows a non-uniform distribution such as a multinomial distribution. The proposed approach makes use of a time-restricted beam search, guided by a novel heuristic named GMPSUM. This heuristic combines two complementary scoring functions in the form of a convex combination. Furthermore, apart from the close-to-uniform benchmark sets from the related literature, we introduce three new benchmark sets that differ in terms of their statistical properties. One of these sets concerns a case study in the context of text analysis. We provide a comprehensive empirical evaluation in two distinctive settings: (1) short-time execution with fixed beam size in order to evaluate the guidance abilities of the compared search heuristics; and (2) long-time executions with fixed target duration times in order to obtain high-quality solutions. In both settings, the newly proposed approach performs comparably to state-of-the-art techniques in the context of close-to-uniform instances and outperforms state-of-the-art approaches for non-uniform instances.	en_US
dc.language.iso	en	en_US
dc.publisher	MDPI	en_US
dc.relation.ispartof	Mathematics	en_US
dc.subject	Longest common subsequence problem	en_US
dc.subject	Multi-nomial distribution	en_US
dc.subject	Probability-based search guidance	en_US
dc.title	Solving the longest common subsequence problem concerning non-uniform distributions of letters in input strings	en_US
dc.type	Article	en_US
dc.identifier.doi	10.3390/math9131515	-
dc.identifier.scopus	2-s2.0-85109406939	-
dc.identifier.isi	000671014400001	-
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/85109406939	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.issn	2227-7390	en_US
dc.description.rank	M21a	en_US
dc.relation.firstpage	Artcile no. 1515	en_US
dc.relation.volume	9	en_US
dc.relation.issue	13	en_US
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.openairetype	Article	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.cerifentitytype	Publications	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.orcid	0000-0001-9839-6039	-
Appears in Collections:	Research outputs

Show simple item record

SCOPUS^TM
Citations

5

checked on May 19, 2026

Page view(s)

22

checked on Jan 19, 2025

Google Scholar^TM

Check

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

Altmetric

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM