Doctoral Dissertations
Keywords and Phrases
MicroRNAs
Abstract
"The need for automating genome analysis is a result of the tremendous amount of genomic data. As of today, a high-throughput DNA sequencing machine can run millions of sequencing reactions in parallel, and it is becoming faster and cheaper to sequence the entire genome of an organism. Public databases containing genomic data are growing exponentially, and hence the rise in demand for intuitive automated methods of DNA analysis and subsequent gene identification. However, the complexity of gene organization makes automation a challenging task, and smart algorithm design and parallelization are necessary to perform accurate analyses in reasonable amounts of time. This work describes two such automated methods for the identification of novel genes within given DNA sequences. The first method utilizes negative selection patterns as an evolutionary rationale for the identification of additional members of a gene family. As input it requires a known protein coding gene in that family. The second method is a massively parallel data mining algorithm that searches a whole genome for inverted repeats (palindromic sequences) and identifies potential precursors of non-coding RNA genes. Both methods were validated successfully on the fully sequenced and well studied plant species, Arabidopsis thaliana"--Abstract, page iv.
Advisor(s)
Erçal, Fikret
Frank, Ronald L.
Committee Member(s)
Leopold, Jennifer
Chellappan, Sriram
Madria, Sanjay Kumar
Department(s)
Computer Science
Degree Name
Ph. D. in Computer Science
Publisher
Missouri University of Science and Technology
Publication Date
Summer 2010
Journal article titles appearing in thesis/dissertation
- Validation of an NSP-based (negative selection pattern) gene family identification strategy
- Automation of an NSP-based (negative selection pattern) gene family identification strategy
- Framework for automated enrichment of functionally significant inverted repeats in whole genomes
Pagination
ix, 63 pages
Note about bibliography
Includes bibliographical references.
Rights
© 2010 Cyriac Kandoth, All rights reserved.
Document Type
Dissertation - Open Access
File Type
text
Language
English
Subject Headings
DNA -- AnalysisGenes -- IdentificationRNA -- AnalysisSequence alignment (Bioinformatics)
Thesis Number
T 9659
Print OCLC #
692208267
Electronic OCLC #
752210699
Recommended Citation
Kandoth, Cyriac, "Computational methods for the discovery and analysis of genes and other functional DNA sequences" (2010). Doctoral Dissertations. 1903.
https://scholarsmine.mst.edu/doctoral_dissertations/1903