Protein Secondary Structure Prediction using BLAST and Relaxed Threshold Rule Induction from Coverings


Protein structure prediction has been a very important and challenging research problem in bioinformatics for years. Yet the determination of protein structures by time-consuming and relatively expensive experimental methods continues to lag far behind the explosive discovery of protein sequences. With the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of the best computational prediction methods has finally exceeded 80%. Herein we present a rule-based data-mining approach called BLAST-RT-RICO (Relaxed Threshold Rule Induction from Coverings) that utilizes multiple sequence alignment information to predict protein secondary structure. This method uses the PSI-BLAST algorithm to identify suitable proteins, and then generates rules from these proteins that can be used to predict secondary structure. By also utilizing known homologous template secondary structures in the Protein Data Bank (PDB) database, BLAST-RT-RICO achieved a Q3 score of 89.93% on the standard test dataset RS126 and a Q3 score of 87.71% on the standard test dataset CB396. These successful preliminary results suggest that this rule-based method may be the foundation for even more accurate prediction of protein secondary structure in the future.

Meeting Name

2011 IEEE Symposium on Computational Intelligence and Computational Biology, CIBCB 2011 (2011: Apr 11-15, Paris, France)


Computer Science

Second Department

Biological Sciences

Keywords and Phrases

BLAST; Data Mining; Protein Secondary Structure Prediction

International Standard Book Number (ISBN)


Document Type

Article - Conference proceedings

Document Version


File Type





© 2011 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

01 Apr 2011