Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2273
Title: Filtering of repeat sequences in genomes
Authors: Jelović, Ana
Beljanski, Miloš
Mitić, Nenad 
Affiliations: Informatics and Computer Science 
Keywords: repeat sequences;DNA;protein sequences;statistical filtering
Issue Date: 2017
Rank: M63
Publisher: Beograd : Matematički fakultet
Related Publication(s): Proceedings of the Belgrade BioInformatics Conference BelBI 2016
Conference: Belgrade BioInformatics Conference BelBI ([1] ; 2016 ; Belgrade)
Abstract: 
Finding repeat sequences in nucleic acids and proteins is of great importance in biology. A number of tools are able to efficiently extract these sequences. If we search for repeated sequences in a completely random computer-generated sequence of any meaningful length we will still find a large number of matches. We developed a method for efficiently estimating the probability of a group of found repeated sequences being randomly occurring, and an accompanying program that finds and then filters the found repeated sequences based on the given probability threshold. What makes our method different from existing ones is that we dont group the results by repeat length only but also by number of occurrences. Even short repeated sequences that happen many times may be statistically significant, or longer repeated sequences occurring just a few times may not be. For the large number of repeated sequences that can be found in a genome if the minimal sequence length is relatively low, our method provides a significant gain in performance and quality of results compared to outputting all the found sequences. The method can be applied to both nucleic acids and protein sequences. We have found that, as previously expected, longer repeated sequences mostly have higher probability that they are statistically significant, but also counterintuitively that for some viruses, for example, shorter repeated sequences are more important than the longer ones.
URI: https://research.matf.bg.ac.rs/handle/123456789/2273
Appears in Collections:Research outputs

Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.