Abstract
This Paper Proposes a Methodology using a Fast Variable Selection as a Modified Version of the Forward-Backward Algorithm. This Methodology is Adapted to the Specificities of the Data Used: Very Small Number of Samples and High Number of Variables. Such Data is Generated using Underlying Dependencies and Seasonality Assumptions, from Meme Phrases Volume Data. by the Use of a Resampling Technique Along with the Proposed Variable Selection Scheme, Significant Results Are Obtained, and the Test Normalized Mean Square Error Performances Are Improved. the Results Indicate that with the Assumptions Made on the Data Structure, Variable Selection is Desirable. Also, the Obtained Information on the Selected Variables Seem to Cluster the Time Series in Two Very Different Classes: A Set of Approximately 600 Series, Which Yield Good NMSE, and Seem to Require Very Similar Sets of Variables for the Prediction; and Another Set of 300 - 400 Series, for Which Only the Previous Series Value is of Interest for the Prediction. This First Analysis Clearly Illustrates the Future Need to Perform a More Thorough Analysis of the Selected Variables for Each of the Batch of Series. Also, taking a Close Look at the Possible Dependences between the Series Inside a Batch Should Give Information as to Why and How They Are Similar and Have Found Themselves to Be Grouped under the Same Batch.
Recommended Citation
Y. Miche et al., "Fast Variable Selection for Memetracker Phrases Time Series Prediction," ACM International Conference Proceeding Series, article no. 47, Association for Computing Machinery, Dec 2012.
The definitive version is available at https://doi.org/10.1145/2413097.2413156
Department(s)
Engineering Management and Systems Engineering
International Standard Book Number (ISBN)
978-145031300-1
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Association for Computing Machinery, All rights reserved.
Publication Date
01 Dec 2012