Missouri S&T Scholar's Mine Research RepositoryMissouri S&T Research
print 
Title: Application of text mining in developing standardized descriptions of taxa in paleontology: A framework
Author (s): Lea, Bih-Ru
Oboh-Ikuenobe, Francisca
Yu, Vincent (Wen-Bin)
Department/Lab Affiliations: Geological Sciences & Engineering
Information Science & Technology
Business & Information Technology
Keywords: Paleontology
Taxa
Text mining
Issue Date: 2006
Publisher: U.S. Geological Survey, Information Services
Citation: Yu, Vincent(Wen-Bin), Lea, Bih-Ru., and Oboh-Ikuenobe, Francisa. "Application of Text Mining in Developing Standardized Descriptions of Taxa in Paleontology: A Framework." Geoinformatics Conference 2006, p. 37 (2006).
Abstract: Like other disciplines of science, the the discovery of new information and the modification of existing knowledge enables advancements in the field of paleontology. The pro-cess of discovery of new information generates large volumes of data that can be overwhelming if not properly stored and (or) utilized. For example, the Treatise on Invertebrate Paleon-tology created by Professor Raymond C. Moore at University of Kansas blazed the trail for similar works that came later. Many paleontological volumes provide information on fos-sil specimens that have been formally named. In palynology, problems can arise with palynomorph classifications and inter-pretations because of the subjective nature of classifications due to human judgments and different levels of training. As a result, the same palynomorph can be interpreted or classified differently, resulting in junior synonyms and amended descrip-tions that can potentially confuse students and new research-ers. It is important to provide a framework to compose a stan-dardized description of each taxon using diverse observations from various taxonomists. The main objective of this study is to propose a frame-work that uses text mining techniques in developing a taxon description recommendation system. Text mining can apply intelligent methods and algorithms to extract or mine knowl-edge and meaningful data patterns from a large amount of unstructured texts or documents for decisionmaking; therefore, it is expected that common characteristics and features from interpretations done by different scholars can be captured and used for clustering and description to minimize the issue of subjective human judgment. The proposed framework will be illustrated using a sample database and a tutorial example. This study will pro-vide insights on (1) how text mining can be used to develop a descriptive model, and (2) how descriptive terms generated during the text mining process can be used to provide a basic set for a standard lexicon to develop a standardized taxon description recommendation. Furthermore, advantages and drawbacks of the proposed framework will be discussed, and future research directions will be proposed.
Type: Article - Journal
text
In Title: Geoinformatics Conference 2006
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
FULL COPYRIGHT INFORMATION:
http://www.usgs.gov/laws/info_policies.html
Publisher URL:
http://pubs.usgs.gov/sir/2006/5201/2006-5201.pdf
Link to this page:
http://scholarsmine.mst.edu/post_prints/ApplicationofTextMininginDevelopingStandardizedD_09007dcc804e1d99.html



titleApplication of text mining in developing standardized descriptions of taxa in paleontology: A framework
contributor.authorLea, Bih-Ru
contributor.authorOboh-Ikuenobe, Francisca
contributor.authorYu, Vincent (Wen-Bin)
contributor.deptlabGeological Sciences & Engineering
contributor.deptlabInformation Science & Technology
contributor.deptlabBusiness & Information Technology
subjectPaleontology
subjectTaxa
subjectText mining
date.issued2006
publisherU.S. Geological Survey, Information Services
identifier.citationYu, Vincent(Wen-Bin), Lea, Bih-Ru., and Oboh-Ikuenobe, Francisa. "Application of Text Mining in Developing Standardized Descriptions of Taxa in Paleontology: A Framework." Geoinformatics Conference 2006, p. 37 (2006).
identifier.pub.URI
http://pubs.usgs.gov/sir/2006/5201/2006-5201.pdf
description.abstractLike other disciplines of science, the the discovery of new information and the modification of existing knowledge enables advancements in the field of paleontology. The pro-cess of discovery of new information generates large volumes of data that can be overwhelming if not properly stored and (or) utilized. For example, the Treatise on Invertebrate Paleon-tology created by Professor Raymond C. Moore at University of Kansas blazed the trail for similar works that came later. Many paleontological volumes provide information on fos-sil specimens that have been formally named. In palynology, problems can arise with palynomorph classifications and inter-pretations because of the subjective nature of classifications due to human judgments and different levels of training. As a result, the same palynomorph can be interpreted or classified differently, resulting in junior synonyms and amended descrip-tions that can potentially confuse students and new research-ers. It is important to provide a framework to compose a stan-dardized description of each taxon using diverse observations from various taxonomists. The main objective of this study is to propose a frame-work that uses text mining techniques in developing a taxon description recommendation system. Text mining can apply intelligent methods and algorithms to extract or mine knowl-edge and meaningful data patterns from a large amount of unstructured texts or documents for decisionmaking; therefore, it is expected that common characteristics and features from interpretations done by different scholars can be captured and used for clustering and description to minimize the issue of subjective human judgment. The proposed framework will be illustrated using a sample database and a tutorial example. This study will pro-vide insights on (1) how text mining can be used to develop a descriptive model, and (2) how descriptive terms generated during the text mining process can be used to provide a basic set for a standard lexicon to develop a standardized taxon description recommendation. Furthermore, advantages and drawbacks of the proposed framework will be discussed, and future research directions will be proposed.
typeArticle - Journal
type.DCMITypetext
type.statusFinal version
rightsThis material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
rights.URI
http://www.usgs.gov/laws/info_policies.html
relation.isPartOfGeoinformatics Conference 2006
date.accessioned2007-04-11T17:00:48Z
date.available2008-04-16T19:56:39Z
identifier.persist.URI
http://scholarsmine.mst.edu/post_prints/ApplicationofTextMininginDevelopingStandardizedD_09007dcc804e1d99.html