Doctoral Dissertations

Keywords and Phrases



"The need for automating genome analysis is a result of the tremendous amount of genomic data. As of today, a high-throughput DNA sequencing machine can run millions of sequencing reactions in parallel, and it is becoming faster and cheaper to sequence the entire genome of an organism. Public databases containing genomic data are growing exponentially, and hence the rise in demand for intuitive automated methods of DNA analysis and subsequent gene identification. However, the complexity of gene organization makes automation a challenging task, and smart algorithm design and parallelization are necessary to perform accurate analyses in reasonable amounts of time. This work describes two such automated methods for the identification of novel genes within given DNA sequences. The first method utilizes negative selection patterns as an evolutionary rationale for the identification of additional members of a gene family. As input it requires a known protein coding gene in that family. The second method is a massively parallel data mining algorithm that searches a whole genome for inverted repeats (palindromic sequences) and identifies potential precursors of non-coding RNA genes. Both methods were validated successfully on the fully sequenced and well studied plant species, Arabidopsis thaliana"--Abstract, page iv.


Erçal, Fikret
Frank, Ronald L.

Committee Member(s)

Leopold, Jennifer
Chellappan, Sriram
Madria, Sanjay Kumar


Computer Science

Degree Name

Ph. D. in Computer Science


Missouri University of Science and Technology

Publication Date

Summer 2010

Journal article titles appearing in thesis/dissertation

  • Validation of an NSP-based (negative selection pattern) gene family identification strategy
  • Automation of an NSP-based (negative selection pattern) gene family identification strategy
  • Framework for automated enrichment of functionally significant inverted repeats in whole genomes


ix, 63 pages

Note about bibliography

Includes bibliographical references.


© 2010 Cyriac Kandoth, All rights reserved.

Document Type

Dissertation - Open Access

File Type




Subject Headings

DNA -- Analysis
Genes -- Identification
RNA -- Analysis
Sequence alignment (Bioinformatics)

Thesis Number

T 9659

Print OCLC #


Electronic OCLC #