Principal Component Analysis as an Integral Part of Data Mining in Health Informatics
Linear and logistic regression are well-known data mining techniques, however, their ability to deal with interdependent variables is limited. Principal component analysis (PCA) is a prevalent data reduction tool that both transforms the data orthogonally and reduces its dimensionality. In this paper we explore an adaptive hybrid approach where PCA can be used in conjunction with logistic regression to yield models which have both a better fit and a reduced set of factors than those produced by just the regression analysis. We will use example dataset from HealthData.gov to demonstrate the simplicity, applicability and usability of our approach.
C. Sabharwal and B. Anjum, "Principal Component Analysis as an Integral Part of Data Mining in Health Informatics," Proceedings of the 31st International Conference on Computers and Their Applications (2016, Las Vegas, NV), pp. 251-256, International Society for Computers and Their Applications (ISCA), Apr 2016.
31st International Conference on Computers and Their Applications, CATA 2016 (2016: Apr. 4-6, Las Vegas, NV)
Keywords and Phrases
Big data; Data handling; Data mining; Data reduction; Regression analysis; Data analytics; Health informatics; Hybrid approach; Integral part; Logistic regressions; Yield models; Principal component analysis; Big data analytics; Healthcare analytics
International Standard Book Number (ISBN)
Article - Conference proceedings
© 2016 International Society for Computers and Their Applications (ISCA), All rights reserved.