Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/325
Title: N-gram analysis of COG categorized protein sequences
Authors: Marovac, Ulfeta A.
Mitić, Nenad 
Affiliations: Informatics and Computer Science 
Issue Date: 1-Jan-2015
Journal: Match
Abstract: 
The classification of proteins categorized in the Cluster of Orthologous Groups (COGs) is important for better understanding of biological processes, as well as for various pathological conditions in human and other organisms. In this paper, a model for classification of proteins in the COG categories based on characteristic amino acid n-grams is proposed. A novel method, based on Boolean algebra, for extracting n-grams which characterize proteins belonging to a certain COG category is presented. The presented method significantly reduces the number of processed n-grams, which implies the reduction of required storage space and processing time. The obtained results show that the proteins of a certain COG category contain n-grams which satisfy specific patterns; such n-grams are unique, related to different COG categories. The model for classification based on the proposed method assigns a correct COG category to a protein with the confidence of 96%.
URI: https://research.matf.bg.ac.rs/handle/123456789/325
ISSN: 03406253
Appears in Collections:Research outputs

Show full item record

SCOPUSTM   
Citations

1
checked on Dec 18, 2024

Page view(s)

11
checked on Dec 24, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.