A Comprehensive Cluster and Classification Mining Procedure for Daily Stock Market Return Forecasting

Abstract

Data mining and big data analytic techniques are playing an important role in many application fields, including the financial markets. However, only few studies have focused on predicting daily stock market returns, and among these studies, the data mining procedures utilized are either incomplete or inefficient. This paper presents a comprehensive data mining process to forecast the daily direction of the S&P 500 Index ETF (SPY) return based on 60 financial and economical features. The fuzzy c-means method (FCM) is initially used to cluster the preprocessed data. A principal component analysis (PCA) is applied next to the entire data set and each of seven clusters. The dimension of the entire cleaned data set is then reduced according to the combining results from the entire data set and each cluster. Corresponding to different levels of the dimensionality reduction, twelve new data sets are generated from the entire cleaned data. Artificial neural networks (ANNs) and logistic regression models are then used with the twelve transformed data sets for classification in order to forecast the daily direction of future market returns and indicate the efficiency of dimensionality reduction with PCA. A group of hypothesis tests are performed over the classification and simulation results to show that the ANNs give significantly higher classification accuracy than logistic regression, and that the trading strategies guided by the comprehensive cluster and classification mining procedure based on PCA and ANNs gain higher risk-adjusted profits than the comparison benchmarks, as well as those strategies guided by the forecasts based on PCA and logistic regression models.

Department(s)

Engineering Management and Systems Engineering

Research Center/Lab(s)

Intelligent Systems Center

Keywords and Phrases

Classification (of information); Commerce; Data mining; Economic analysis; Electronic trading; Finance; Financial data processing; Financial markets; Forecasting; Fuzzy neural networks; Fuzzy systems; Investments; Neural networks; Principal component analysis; Reduction; Regression analysis; Classification accuracy; Classification mining; Dimensionality reduction; Fuzzy C mean; Fuzzy c-means methods; Logistic regression models; Logistic regressions; Stock return forecasting; Big data; Accuracy; Article; Artificial neural networks (ANNs); Benchmarking; Cluster analysis; Financial information system; Financial management; Fuzzy c means method; Logistic regression analysis; Principal component analysis (PCA); Priority journal; Process optimization; Simulation; Stock market return; Daily stock return forecasting; Fuzzy c-means (FCM)

International Standard Serial Number (ISSN)

0925-2312

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2017 Elsevier, All rights reserved.

Publication Date

01 Dec 2017

Share

 
COinS