Keywords and Phrases
"The need for automating genome analysis is a result of the tremendous amount of genomic data. As of today, a high-throughput DNA sequencing machine can run millions of sequencing reactions in parallel, and it is becoming faster and cheaper to sequence the entire genome of an organism. Public databases containing genomic data are growing exponentially, and hence the rise in demand for intuitive automated methods of DNA analysis and subsequent gene identification. However, the complexity of gene organization makes automation a challenging task, and smart algorithm design and parallelization are necessary to perform accurate analyses in reasonable amounts of time. This work describes two such automated methods for the identification of novel genes within given DNA sequences. The first method utilizes negative selection patterns as an evolutionary rationale for the identification of additional members of a gene family. As input it requires a known protein coding gene in that family. The second method is a massively parallel data mining algorithm that searches a whole genome for inverted repeats (palindromic sequences) and identifies potential precursors of non-coding RNA genes. Both methods were validated successfully on the fully sequenced and well studied plant species, Arabidopsis thaliana"--Abstract, page iv.
Frank, Ronald L.
Madria, Sanjay Kumar
Ph. D. in Computer Science
Missouri University of Science and Technology
Journal article titles appearing in thesis/dissertation
- Validation of an NSP-based (negative selection pattern) gene family identification strategy
- Automation of an NSP-based (negative selection pattern) gene family identification strategy
- Framework for automated enrichment of functionally significant inverted repeats in whole genomes
ix, 63 pages
© 2010 Cyriac Kandoth, All rights reserved.
Dissertation - Open Access
DNA -- Analysis
Genes -- Identification
RNA -- Analysis
Sequence alignment (Bioinformatics)
Print OCLC #
Electronic OCLC #
Link to Catalog Record
Kandoth, Cyriac, "Computational methods for the discovery and analysis of genes and other functional DNA sequences" (2010). Doctoral Dissertations. 1903.