Please use this identifier to cite or link to this item:
Title: N-gram analysis of COG categorized protein sequences
Authors: Marovac, Ulfeta A.
Mitić, Nenad 
Affiliations: Informatics and Computer Science 
Issue Date: 1-Jan-2015
Journal: Match
The classification of proteins categorized in the Cluster of Orthologous Groups (COGs) is important for better understanding of biological processes, as well as for various pathological conditions in human and other organisms. In this paper, a model for classification of proteins in the COG categories based on characteristic amino acid n-grams is proposed. A novel method, based on Boolean algebra, for extracting n-grams which characterize proteins belonging to a certain COG category is presented. The presented method significantly reduces the number of processed n-grams, which implies the reduction of required storage space and processing time. The obtained results show that the proteins of a certain COG category contain n-grams which satisfy specific patterns; such n-grams are unique, related to different COG categories. The model for classification based on the proposed method assigns a correct COG category to a protein with the confidence of 96%.
ISSN: 03406253
Appears in Collections:Research outputs

Show full item record


checked on Mar 6, 2025

Page view(s)

checked on Jan 19, 2025

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.