Filtering of repeat sequences in genomes

Jelović, Ana; Beljanski, Miloš; Mitić, Nenad

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2273

DC Field	Value	Language
dc.contributor.author	Jelović, Ana	en_US
dc.contributor.author	Beljanski, Miloš	en_US
dc.contributor.author	Mitić, Nenad	en_US
dc.date.accessioned	2025-07-22T09:41:07Z	-
dc.date.available	2025-07-22T09:41:07Z	-
dc.date.issued	2017	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/2273	-
dc.description.abstract	Finding repeat sequences in nucleic acids and proteins is of great importance in biology. A number of tools are able to efficiently extract these sequences. If we search for repeated sequences in a completely random computer-generated sequence of any meaningful length we will still find a large number of matches. We developed a method for efficiently estimating the probability of a group of found repeated sequences being randomly occurring, and an accompanying program that finds and then filters the found repeated sequences based on the given probability threshold. What makes our method different from existing ones is that we dont group the results by repeat length only but also by number of occurrences. Even short repeated sequences that happen many times may be statistically significant, or longer repeated sequences occurring just a few times may not be. For the large number of repeated sequences that can be found in a genome if the minimal sequence length is relatively low, our method provides a significant gain in performance and quality of results compared to outputting all the found sequences. The method can be applied to both nucleic acids and protein sequences. We have found that, as previously expected, longer repeated sequences mostly have higher probability that they are statistically significant, but also counterintuitively that for some viruses, for example, shorter repeated sequences are more important than the longer ones.	en_US
dc.language.iso	en	en_US
dc.publisher	Beograd : Matematički fakultet	en_US
dc.subject	repeat sequences	en_US
dc.subject	DNA	en_US
dc.subject	protein sequences	en_US
dc.subject	statistical filtering	en_US
dc.title	Filtering of repeat sequences in genomes	en_US
dc.type	Conference Object	en_US
dc.relation.conference	Belgrade BioInformatics Conference BelBI ([1] ; 2016 ; Belgrade)	en_US
dc.relation.publication	Proceedings of the Belgrade BioInformatics Conference BelBI 2016	en_US
dc.identifier.url	http://belbi2016.matf.bg.ac.rs/wp-content/uploads/2023/03/Proceedings.BelBi_2016.pdf	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.isbn	978-86-7589-124-6	en_US
dc.description.rank	M63	en_US
dc.relation.firstpage	73	en_US
dc.relation.lastpage	81	en_US
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.grantfulltext	none	-
item.cerifentitytype	Publications	-
item.fulltext	No Fulltext	-
item.openairetype	Conference Object	-
item.languageiso639-1	en	-
crisitem.author.dept	Informatics and Computer Science	-
Appears in Collections:	Research outputs

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM