Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/2273
Title: | Filtering of repeat sequences in genomes | Authors: | Jelović, Ana Beljanski, Miloš Mitić, Nenad |
Affiliations: | Informatics and Computer Science | Keywords: | repeat sequences;DNA;protein sequences;statistical filtering | Issue Date: | 2017 | Rank: | M63 | Publisher: | Beograd : Matematički fakultet | Related Publication(s): | Proceedings of the Belgrade BioInformatics Conference BelBI 2016 | Conference: | Belgrade BioInformatics Conference BelBI ([1] ; 2016 ; Belgrade) | Abstract: | Finding repeat sequences in nucleic acids and proteins is of great importance in biology. A number of tools are able to efficiently extract these sequences. If we search for repeated sequences in a completely random computer-generated sequence of any meaningful length we will still find a large number of matches. We developed a method for efficiently estimating the probability of a group of found repeated sequences being randomly occurring, and an accompanying program that finds and then filters the found repeated sequences based on the given probability threshold. What makes our method different from existing ones is that we dont group the results by repeat length only but also by number of occurrences. Even short repeated sequences that happen many times may be statistically significant, or longer repeated sequences occurring just a few times may not be. For the large number of repeated sequences that can be found in a genome if the minimal sequence length is relatively low, our method provides a significant gain in performance and quality of results compared to outputting all the found sequences. The method can be applied to both nucleic acids and protein sequences. We have found that, as previously expected, longer repeated sequences mostly have higher probability that they are statistically significant, but also counterintuitively that for some viruses, for example, shorter repeated sequences are more important than the longer ones. |
URI: | https://research.matf.bg.ac.rs/handle/123456789/2273 |
Appears in Collections: | Research outputs |
Show full item record
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.