Masters Theses


"Text mining helps in extracting knowledge and useful information from unstructured data. It detects and extracts information from mountains of documents and allowing in selecting data related to a particular data.

In this study, text mining is applied to the 10-12b filings done by the companies during Corporate Spin-off. The main purposes are (1) To investigate potential and/or major concerns found from these financial statements filed for corporate spin-off and (2) To identify appropriate methods in text mining which can be used to reveal these major concerns.

10-12b filings from thirty-four companies were taken and only the "Risk Factors" category was taken for analysis. Term weights such as Entropy, IDF, GF-IDF, Normal and None were applied on the input data and out of them Entropy and GF-IDF were found to be the appropriate term weights which provided acceptable results. These accepted term weights gave the results which was acceptable to human expert's expectations. The document distribution from these term weights created a pattern which reflected the mood or focus of the input documents.

In addition to the analysis, this study also provides a pilot study for future work in predictive text mining for the analysis of similar financial documents. For example, the descriptive terms found from this study provide a set of start word list which eliminates the try and error method of framing an initial start list"--Abstract, page iii.


Yu, Vincent (Wen-Bin)

Committee Member(s)

Lin, Ying Chou
Lea, Bih-Ru


Business and Information Technology

Degree Name

M.S. in Information Science and Technology


Missouri University of Science and Technology

Publication Date

Spring 2011


viii, 81 pages


© 2011 Aravindh Sekar, All rights reserved.

Document Type

Thesis - Open Access

File Type




Subject Headings

Corporate divestiture -- Accounting
Data mining
Financial statements

Thesis Number

T 9862

Print OCLC #


Electronic OCLC #