Missouri S&T Scholar's Mine Research RepositoryMissouri S&T Research
print 
Title: Evaluation of Glycine max mRNA clusters
Author (s): Frank, Ronald L.
Ercal, Fikret
Department/Lab Affiliations: Biological Sciences
Computer Science
Keywords: Glycine max mRNA clusters
Multiple stringencies
Nucleotide and amino acid
UniGene
Issue Date: 2005-07
Publisher: BioMed Central
Citation: Frank, R.L., and Ercal, F. "Evaluation of Glycine Max mRNA Clusters." BMC Bioinformatics, vol. 6, (2005).
Abstract: Background Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. Results Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. Conclusion Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences.
Type: Article - Journal
text
In Title: BMC Bioinformatics
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
FULL COPYRIGHT INFORMATION:
http://www.biomedcentral.com/info/about/license
Publisher URL:
http://www.biomedcentral.com/1471-2105/6/S2/S7
Link to this page:
http://scholarsmine.mst.edu/post_prints/EvaluationofGlycinemaxmRNAclusters_09007dcc804d661a.html
Full Text:
evaluationofglycine_09007dcc804d82d5.pdf



titleEvaluation of Glycine max mRNA clusters
contributor.authorFrank, Ronald L.
contributor.authorErcal, Fikret
contributor.deptlabBiological Sciences
contributor.deptlabComputer Science
contributor.sponsorMissouri State University
subjectGlycine max mRNA clusters
subjectMultiple stringencies
subjectNucleotide and amino acid
subjectUniGene
date.issued2005-07
publisherBioMed Central
identifier.citationFrank, R.L., and Ercal, F. "Evaluation of Glycine Max mRNA Clusters." BMC Bioinformatics, vol. 6, (2005).
identifier.pub.URI
http://www.biomedcentral.com/1471-2105/6/S2/S7
description.abstractBackground Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. Results Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. Conclusion Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences.
typeArticle - Journal
type.DCMITypetext
type.statusFinal version
rightsThis material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
rights.URI
http://www.biomedcentral.com/info/about/license
relation.isPartOfBMC Bioinformatics
date.accessioned2008-04-09T15:15:28Z
date.available2008-04-09T15:15:27Z
identifier.persist.URI
http://scholarsmine.mst.edu/post_prints/EvaluationofGlycinemaxmRNAclusters_09007dcc804d661a.html
Full Text
evaluationofglycine_09007dcc804d82d5.pdf