Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/325
Title: | N-gram analysis of COG categorized protein sequences | Authors: | Marovac, Ulfeta A. Mitić, Nenad |
Affiliations: | Informatics and Computer Science | Issue Date: | 1-Jan-2015 | Journal: | Match | Abstract: | The classification of proteins categorized in the Cluster of Orthologous Groups (COGs) is important for better understanding of biological processes, as well as for various pathological conditions in human and other organisms. In this paper, a model for classification of proteins in the COG categories based on characteristic amino acid n-grams is proposed. A novel method, based on Boolean algebra, for extracting n-grams which characterize proteins belonging to a certain COG category is presented. The presented method significantly reduces the number of processed n-grams, which implies the reduction of required storage space and processing time. The obtained results show that the proteins of a certain COG category contain n-grams which satisfy specific patterns; such n-grams are unique, related to different COG categories. The model for classification based on the proposed method assigns a correct COG category to a protein with the confidence of 96%. |
URI: | https://research.matf.bg.ac.rs/handle/123456789/325 | ISSN: | 03406253 |
Appears in Collections: | Research outputs |
Show full item record
SCOPUSTM
Citations
1
checked on Dec 18, 2024
Page view(s)
11
checked on Dec 25, 2024
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.