Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/691
DC FieldValueLanguage
dc.contributor.authorNikolic, Bojanen_US
dc.contributor.authorKartelj, Aleksandaren_US
dc.contributor.authorDjukanovic, Markoen_US
dc.contributor.authorGrbic, Milanaen_US
dc.contributor.authorBlum, Christianen_US
dc.contributor.authorRaidl, Güntheren_US
dc.date.accessioned2022-08-14T10:03:57Z-
dc.date.available2022-08-14T10:03:57Z-
dc.date.issued2021-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/691-
dc.description.abstractThe longest common subsequence (LCS) problem is a prominent N P–hard optimization problem where, given an arbitrary set of input strings, the aim is to find a longest subsequence, which is common to all input strings. This problem has a variety of applications in bioinformatics, molecular biology and file plagiarism checking, among others. All previous approaches from the literature are dedicated to solving LCS instances sampled from uniform or near-to-uniform probability distributions of letters in the input strings. In this paper, we introduce an approach that is able to effectively deal with more general cases, where the occurrence of letters in the input strings follows a non-uniform distribution such as a multinomial distribution. The proposed approach makes use of a time-restricted beam search, guided by a novel heuristic named GMPSUM. This heuristic combines two complementary scoring functions in the form of a convex combination. Furthermore, apart from the close-to-uniform benchmark sets from the related literature, we introduce three new benchmark sets that differ in terms of their statistical properties. One of these sets concerns a case study in the context of text analysis. We provide a comprehensive empirical evaluation in two distinctive settings: (1) short-time execution with fixed beam size in order to evaluate the guidance abilities of the compared search heuristics; and (2) long-time executions with fixed target duration times in order to obtain high-quality solutions. In both settings, the newly proposed approach performs comparably to state-of-the-art techniques in the context of close-to-uniform instances and outperforms state-of-the-art approaches for non-uniform instances.en
dc.relation.ispartofMathematicsen
dc.subjectLongest common subsequence problemen
dc.subjectMulti-nomial distributionen
dc.subjectProbability-based search guidanceen
dc.titleSolving the longest common subsequence problem concerning non-uniform distributions of letters in input stringsen_US
dc.typeArticleen_US
dc.identifier.doi10.3390/math9131515-
dc.identifier.scopus2-s2.0-85109406939-
dc.identifier.urlhttps://api.elsevier.com/content/abstract/scopus_id/85109406939-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.volume9en
dc.relation.issue13en
item.fulltextNo Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.grantfulltextnone-
item.openairetypeArticle-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0001-9839-6039-
Appears in Collections:Research outputs
Show simple item record

SCOPUSTM   
Citations

3
checked on Dec 22, 2024

Page view(s)

22
checked on Dec 24, 2024

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.