Keywords and Phrases

Clustering; Feature Selection; Natural Language; Text mining; Vector Space; Wordnet

Abstract

Text mining using the vector space representation has proven to be an valuable tool for classification, prediction, information retrieval and extraction. The nature of text data presents several issues to these tasks, including large dimension and the existence of special polysemous and synonymous words. A variety of techniques have been devised to overcome these shortcomings, including feature selection and word sense disambiguation. Privacy preserving data mining is also an area of emerging interest. Existing techniques for privacy preserving data mining require the use of secure computation protocols, which often incur a greatly increased computational cost. In this paper, a generalization-based method is presented for creating a semantic-preserving vector space which reduces dimension as well as addresses problems with special word types. The SPVSM also allows private text data to be safely represented without degrading cluster accuracy or performance. Further, the result produced is also usable in combination with theoretic based techniques such as latent semantic indexing. The performance of text clustering using the semantic preserving generalization method is evaluated and compared to existing feature selection techniques, and shown to have significant merit from a clustering perspective.

Advisor(s)

Jiang, Wei

Committee Member(s)

Leopold, Jennifer
Wunsch, Donald C.

Department(s)

Computer Science

Degree Name

M.S. in Computer Science

Publisher

Missouri University of Science and Technology

Publication Date

Fall 2012

Pagination

viii, 40 pages

Note about bibliography

Includes bibliographical references (pages 76-77).

Rights

Document Type

Thesis - Open Access

File Type

text

Language

English

Subject Headings

Text processing (Computer science)
Data protection
Data mining -- Statistical methods

Thesis Number

T 10093

Electronic OCLC #

828737701

Recommended Citation

Howard, Michael, "Semantic preserving text tepresentation and its applications in text clustering" (2012). Masters Theses. 6946.
https://scholarsmine.mst.edu/masters_theses/6946

Download

Included in

Computer Sciences Commons

COinS

Masters Theses

Semantic preserving text tepresentation and its applications in text clustering

Keywords and Phrases

Abstract

Advisor(s)

Committee Member(s)

Department(s)

Degree Name

Publisher

Publication Date

Pagination

Note about bibliography

Rights

Document Type

File Type

Language

Subject Headings

Thesis Number

Electronic OCLC #

Recommended Citation

Included in

Search

Browse

Author Corner

Useful Links

Thesis Locations

Masters Theses

Semantic preserving text tepresentation and its applications in text clustering

Author

Keywords and Phrases

Abstract

Advisor(s)

Committee Member(s)

Department(s)

Degree Name

Publisher

Publication Date

Pagination

Note about bibliography

Rights

Document Type

File Type

Language

Subject Headings

Thesis Number

Electronic OCLC #

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Useful Links

Thesis Locations