Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2273
DC FieldValueLanguage
dc.contributor.authorJelović, Anaen_US
dc.contributor.authorBeljanski, Milošen_US
dc.contributor.authorMitić, Nenaden_US
dc.date.accessioned2025-07-22T09:41:07Z-
dc.date.available2025-07-22T09:41:07Z-
dc.date.issued2017-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/2273-
dc.description.abstractFinding repeat sequences in nucleic acids and proteins is of great importance in biology. A number of tools are able to efficiently extract these sequences. If we search for repeated sequences in a completely random computer-generated sequence of any meaningful length we will still find a large number of matches. We developed a method for efficiently estimating the probability of a group of found repeated sequences being randomly occurring, and an accompanying program that finds and then filters the found repeated sequences based on the given probability threshold. What makes our method different from existing ones is that we dont group the results by repeat length only but also by number of occurrences. Even short repeated sequences that happen many times may be statistically significant, or longer repeated sequences occurring just a few times may not be. For the large number of repeated sequences that can be found in a genome if the minimal sequence length is relatively low, our method provides a significant gain in performance and quality of results compared to outputting all the found sequences. The method can be applied to both nucleic acids and protein sequences. We have found that, as previously expected, longer repeated sequences mostly have higher probability that they are statistically significant, but also counterintuitively that for some viruses, for example, shorter repeated sequences are more important than the longer ones.en_US
dc.language.isoenen_US
dc.publisherBeograd : Matematički fakulteten_US
dc.subjectrepeat sequencesen_US
dc.subjectDNAen_US
dc.subjectprotein sequencesen_US
dc.subjectstatistical filteringen_US
dc.titleFiltering of repeat sequences in genomesen_US
dc.typeConference Objecten_US
dc.relation.conferenceBelgrade BioInformatics Conference BelBI ([1] ; 2016 ; Belgrade)en_US
dc.relation.publicationProceedings of the Belgrade BioInformatics Conference BelBI 2016en_US
dc.identifier.urlhttp://belbi2016.matf.bg.ac.rs/wp-content/uploads/2023/03/Proceedings.BelBi_2016.pdf-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.isbn978-86-7589-124-6en_US
dc.description.rankM63en_US
dc.relation.firstpage73en_US
dc.relation.lastpage81en_US
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.languageiso639-1en-
item.openairetypeConference Object-
item.fulltextNo Fulltext-
item.grantfulltextnone-
crisitem.author.deptInformatics and Computer Science-
Appears in Collections:Research outputs
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.