Finding Statistically Significant Repeats in Nucleic Acids and Proteins

Jelovic, Ana M; Mitić, Nenad; Eshafah, Samira; Beljanski, Milos V

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/337

DC Field	Value	Language
dc.contributor.author	Jelovic, Ana M	en_US
dc.contributor.author	Mitić, Nenad	en_US
dc.contributor.author	Eshafah, Samira	en_US
dc.contributor.author	Beljanski, Milos V	en_US
dc.date.accessioned	2022-08-09T12:54:11Z	-
dc.date.available	2022-08-09T12:54:11Z	-
dc.date.issued	2018	-
dc.identifier.issn	10665277	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/337	-
dc.description.abstract	DNA repeats have great importance for biological research and a large number of tools for determining repeats have been developed. Herein we define a method for extracting a statistically significant subset of a determined set of repeats. Our aim was to identify a subset of repeats in the input sequences that are not expected to occur with a number of their appearances in a random sequence of the same length. It is expected that results obtained in such manner would reduce the quantity of processed material and could thereby represent a more important biological signal. With DNA, RNA, and protein sequences serving as input material, we also examined the possibility of statistical filtering of repeats in sequences over an arbitrary alphabet. A new method for selecting statistically significant repeats from a set of determined repeats has been defined. The proposed method was tested on a large number of randomly generated sequences. The application of the method on biological sequences revealed that for some viruses, shorter repeats are more statistically significant than longer ones because of their frequent appearance, whereas for bacteria, the majority of identified repeats are statistically significant.	en_US
dc.language.iso	en	en_US
dc.publisher	Mary Ann Liebert Inc. Publishing	en_US
dc.relation.ispartof	Journal of computational biology : a journal of computational molecular cell biology	en_US
dc.subject	DNA	en_US
dc.subject	RNA	en_US
dc.subject	protein sequences	en_US
dc.subject	repeats	en_US
dc.subject	statistically significant	en_US
dc.title	Finding Statistically Significant Repeats in Nucleic Acids and Proteins	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1089/cmb.2017.0046	-
dc.identifier.pmid	29272145	-
dc.identifier.scopus	2-s2.0-85045195060	-
dc.identifier.isi	000418584300001	-
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/85045195060	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.issn	1557-8666	en_US
dc.description.rank	M21a	en_US
dc.relation.firstpage	375	en_US
dc.relation.lastpage	387	en_US
dc.relation.volume	25	en_US
dc.relation.issue	4	en_US
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.grantfulltext	none	-
item.cerifentitytype	Publications	-
item.fulltext	No Fulltext	-
item.openairetype	Article	-
item.languageiso639-1	en	-
crisitem.author.dept	Informatics and Computer Science	-
Appears in Collections:	Research outputs

Show simple item record

SCOPUS^TM
Citations

5

checked on Feb 8, 2026

Page view(s)

19

checked on Jan 19, 2025

Google Scholar^TM

Check

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

Altmetric

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM