Principal Component Analysis as an Integral Part of Data Mining in Health Informatics
Abstract
Linear and logistic regression are well-known data mining techniques, however, their ability to deal with interdependent variables is limited. Principal component analysis (PCA) is a prevalent data reduction tool that both transforms the data orthogonally and reduces its dimensionality. In this paper we explore an adaptive hybrid approach where PCA can be used in conjunction with logistic regression to yield models which have both a better fit and a reduced set of factors than those produced by just the regression analysis. We will use example dataset from HealthData.gov to demonstrate the simplicity, applicability and usability of our approach.
Recommended Citation
C. Sabharwal and B. Anjum, "Principal Component Analysis as an Integral Part of Data Mining in Health Informatics," Proceedings of the 31st International Conference on Computers and Their Applications (2016, Las Vegas, NV), pp. 251 - 256, International Society for Computers and Their Applications (ISCA), Apr 2016.
Meeting Name
31st International Conference on Computers and Their Applications, CATA 2016 (2016: Apr. 4-6, Las Vegas, NV)
Department(s)
Computer Science
Keywords and Phrases
Big data; Data handling; Data mining; Data reduction; Regression analysis; Data analytics; Health informatics; Hybrid approach; Integral part; Logistic regressions; Yield models; Principal component analysis; Big data analytics; Healthcare analytics
International Standard Book Number (ISBN)
978-1-5108-2252-8
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2016 International Society for Computers and Their Applications (ISCA), All rights reserved.
Publication Date
01 Apr 2016