Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/500
Title: | n-gram-based classification and unsupervised hierarchical clustering of genome sequences | Authors: | Tomović, Andrija Janičić, Predrag Keselj, Vlado |
Affiliations: | Informatics and Computer Science | Keywords: | Classification;Genome sequence;Hierarchical clustering;n-Gram | Issue Date: | 2006 | Journal: | Computer methods and programs in biomedicine | Abstract: | In this paper we address the problem of automated classification of isolates, i.e., the problem of determining the family of genomes to which a given genome belongs. Additionally, we address the problem of automated unsupervised hierarchical clustering of isolates according only to their statistical substring properties. For both of these problems we present novel algorithms based on nucleotide n-grams, with no required preprocessing steps such as sequence alignment. Results obtained experimentally are very positive and suggest that the proposed techniques can be successfully used in a variety of related problems. The reported experiments demonstrate better performance than some of the state-of-the-art methods. We report on a new distance measure between n-gram profiles, which shows superior performance compared to many other measures, including commonly used Euclidean distance. |
URI: | https://research.matf.bg.ac.rs/handle/123456789/500 | ISSN: | 0169-2607 | DOI: | 10.1016/j.cmpb.2005.11.007 |
Appears in Collections: | Research outputs |
Show full item record
SCOPUSTM
Citations
82
checked on Dec 18, 2024
Page view(s)
11
checked on Dec 24, 2024
Google ScholarTM
Check
Altmetric
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.