Doctoral Dissertations
Abstract
"Protein structure prediction has always been an important research area in bioinformatics and biochemistry. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q₃ accuracy of various computational prediction methods rarely has exceeded 75%; this status has changed little since 2003 when Rost stated that "the currently best methods reach a level around 77% three-state per-residue accuracy." The application of artificial neural network methods to this problem is revolutionary in the sense that those techniques employ the homologues of proteins for training and prediction. In this dissertation, a different approach, RT-RICO (Relaxed Threshold Rule Induction from Coverings), is presented that instead uses association rule mining. This approach still makes use of the fundamental principle that structure is more conserved than sequence. However, rules between each known secondary structure element and its "neighboring" amino acid residues are established to perform the predictions. This dissertation consists of five research articles that discuss different prediction techniques and detailed rule-generation algorithms. The most recent prediction approach, BLAST-RT-RICO, achieved a Q₃ accuracy score of 89.93% on the standard test dataset RS126 and a Q₃ score of 87.71% on the standard test dataset CB396, an improvement over comparable computational methods. Herein one research article also discusses the results of examining those RT-RICO rules using an existing association rule visualization tool, modified to account for the non-Boolean characterization of protein secondary structure"--Abstract, page iv.
Advisor(s)
Leopold, Jennifer
Committee Member(s)
Erçal, Fikret
Lin, Dan
Frank, Ronald L.
Wilkerson, Ralph W.
Department(s)
Computer Science
Degree Name
Ph. D. in Computer Science
Publisher
Missouri University of Science and Technology
Publication Date
Fall 2010
Journal article titles appearing in thesis/dissertation
- Protein secondary structure prediction using rule induction from coverings
- Protein secondary structure prediction using parallelized rule induction from coverings
- Protein secondary structure prediction using RT-RICO: a rule-based approach
- Rule visualization of protein motif sequence data for secondary structure prediction
Pagination
xiii, 143 pages
Note about bibliography
Includes bibliographical references.
Rights
© 2010 Leong Lee, All rights reserved.
Document Type
Dissertation - Open Access
File Type
text
Language
English
Subject Headings
BioinformaticsData mining -- Computer programsParallel processing (Electronic computers)Proteins -- Structure -- Computer simulation
Thesis Number
T 9700
Print OCLC #
747436326
Electronic OCLC #
747497410
Recommended Citation
Lee, Leong, "Protein secondary structure prediction using BLAST and relaxed threshold rule induction from coverings" (2010). Doctoral Dissertations. 1904.
https://scholarsmine.mst.edu/doctoral_dissertations/1904