Application of Text Mining in Developing Standardized Descriptions of Taxa in Paleontology: A Framework


Like other disciplines of science, the the discovery of new information and the modification of existing knowledge enables advancements in the field of paleontology. The pro-cess of discovery of new information generates large volumes of data that can be overwhelming if not properly stored and (or) utilized. For example, the Treatise on Invertebrate Paleon-tology created by Professor Raymond C. Moore at University of Kansas blazed the trail for similar works that came later. Many paleontological volumes provide information on fos-sil specimens that have been formally named. In palynology, problems can arise with palynomorph classifications and inter-pretations because of the subjective nature of classifications due to human judgments and different levels of training. As a result, the same palynomorph can be interpreted or classified differently, resulting in junior synonyms and amended descrip-tions that can potentially confuse students and new research-ers. It is important to provide a framework to compose a stan-dardized description of each taxon using diverse observations from various taxonomists.

The main objective of this study is to propose a frame-work that uses text mining techniques in developing a taxon description recommendation system. Text mining can apply intelligent methods and algorithms to extract or mine knowl-edge and meaningful data patterns from a large amount of unstructured texts or documents for decisionmaking; therefore, it is expected that common characteristics and features from interpretations done by different scholars can be captured and used for clustering and description to minimize the issue of subjective human judgment.

The proposed framework will be illustrated using a sample database and a tutorial example. This study will pro-vide insights on (1) how text mining can be used to develop a descriptive model, and (2) how descriptive terms generated during the text mining process can be used to provide a basic set for a standard lexicon to develop a standardized taxon description recommendation. Furthermore, advantages and drawbacks of the proposed framework will be discussed, and future research directions will be proposed.

Meeting Name

Geoinformatics Conference (2006: May 10-12, Reston, VA)


Business and Information Technology

Second Department

Geosciences and Geological and Petroleum Engineering

Document Type

Article - Conference proceedings

Document Version


File Type





© 2006 Geological Survey (U.S.). Information Services, All rights reserved.

Publication Date

01 May 2006