Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/327
Title: Prediction of structural alphabet protein blocks using data mining
Authors: Maljković Ružičić, Mirjana 
Mitić, Nenad 
de Brevern, Alexandre G
Affiliations: Informatics and Computer Science 
Informatics and Computer Science 
Keywords: Amino acid sequence;Disorder predictors;Machine learning;Protein blocks;Repeats;Spider3
Issue Date: 2022
Rank: M22
Publisher: Elsevier
Journal: Biochimie
Abstract: 
3D protein structures determine proteins' biological functions. The 3D structure of the protein backbone can be approximated using the prototypes of local protein conformations. Sets of these prototypes are called structural alphabets (SAs). Amongst several approaches to the prediction of 3D structures from amino acid sequences, one approach is based on the prediction of SA prototypes for a given amino acid sequence. Protein Blocks (PBs) is the most known SA, and it is composed of 16 prototypes of five consecutive amino acids which were identified as optimal prototypes considering the ability to correctly approximate the local structure and the prediction accuracy of prototypes from an amino acid sequence. We developed models for PBs prediction from sequence information using different data mining approaches and machine learning algorithms. Besides the amino acid sequences, the results of the following tools were used to train the models: the Spider3 predictor of protein structure properties, several predictors of the protein's intrinsically disordered regions, and a tool for finding repeats in amino acid sequences. The highest accuracy of the constructed models is 80%, which is a significant improvement compared to the previous best available prediction, whose accuracy was 61%. Analyzing the models constructed by applying different algorithms, it was noticed that the significance of input attributes differs among the models constructed by algorithms. Using the information about amino acids belonging to intrinsically disordered regions and repeats improves the precision of prediction for some PBs using the CART classification algorithm, while this is not the case with the C5.0 classification algorithm. Improved prediction approaches can have interesting applications in protein structural model approaches or computational protein design.
URI: https://research.matf.bg.ac.rs/handle/123456789/327
ISSN: 03009084
DOI: 10.1016/j.biochi.2022.01.019
Appears in Collections:Research outputs

Show full item record

SCOPUSTM   
Citations

3
checked on Mar 6, 2025

Page view(s)

27
checked on Jan 19, 2025

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.