Scholars' Mine
Missouri S&T
Research Repository
Curtis Laws Wilson Library
400 W. 14th Street
Rolla, MO 65409-0060
scholarsmine@mst.edu
| Title: | Evaluation of Glycine max mRNA clusters | |
| Author (s): | Frank, Ronald L. Ercal, Fikret | |
| Department/Lab Affiliations: | Biological Sciences Computer Science | |
| Keywords: | Glycine max mRNA clusters Multiple stringencies Nucleotide and amino acid UniGene | |
| Issue Date: | 2005-07 | |
| Publisher: | BioMed Central | |
| Citation: | Frank, R.L., and Ercal, F. "Evaluation of Glycine Max mRNA Clusters." BMC Bioinformatics, vol. 6, (2005). | |
| Abstract: | Background Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. Results Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. Conclusion Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences. | |
| Type: | Article - Journal text | |
| In Title: | BMC Bioinformatics | |
| Copyright Notice: | This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. FULL COPYRIGHT INFORMATION: | |
| Publisher URL: | ||
| Link to this page: | ||
| Full Text: |
|
| title | Evaluation of Glycine max mRNA clusters | |
| contributor.author | Frank, Ronald L. | |
| contributor.author | Ercal, Fikret | |
| contributor.deptlab | Biological Sciences | |
| contributor.deptlab | Computer Science | |
| contributor.sponsor | Missouri State University | |
| subject | Glycine max mRNA clusters | |
| subject | Multiple stringencies | |
| subject | Nucleotide and amino acid | |
| subject | UniGene | |
| date.issued | 2005-07 | |
| publisher | BioMed Central | |
| identifier.citation | Frank, R.L., and Ercal, F. "Evaluation of Glycine Max mRNA Clusters." BMC Bioinformatics, vol. 6, (2005). | |
| identifier.pub.URI | ||
| description.abstract | Background Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for Glycine max we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not. Results Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters. Conclusion Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences. | |
| type | Article - Journal | |
| type.DCMIType | text | |
| type.status | Final version | |
| rights | This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. | |
| rights.URI | ||
| relation.isPartOf | BMC Bioinformatics | |
| date.accessioned | 2008-04-09T15:15:28Z | |
| date.available | 2008-04-09T15:15:27Z | |
| identifier.persist.URI | ||
| Full Text |
|