A Minimax Approach for Classification with Big-Data
In this paper, a novel methodology to reduce the generalization errors occurring due to domain shift in big data classification is presented. This reduction is achieved by introducing a suitably selected domain shift to the training data via what is referred to as "distortion model". These distortions are introduced through an affine transformation and additional data-samples are obtained. Next, a deep neural network (NN), referred as "classifier", is used to classify both the original and the additional data samples. By learning from both the original and additional data-samples, the classifier compensates for the domain shift while maintaining its performance on original data. However, as the exact magnitude of the shift one would encounter in real applications is unknown a priori and difficult to predict. The objective is to compensate for the optimal shift that can be introduced by the distortion model without significantly degrading the performance of the model. A two-player zero-sum game is thus designed where the first player is the distortion model with the aim of increasing the domain shift. The classifier then becomes the second player whose aim is to minimize the impact of domain shift. Finally, a direct error-driven learning scheme is utilized to minimize the impact of the classifier while maximizing the domain shift. A comprehensive simulation study is presented where a 12% improvement in the presence of domain shift is demonstrated. The proposed approach is also shown to improve generalization by 6%.
R. Krishnan et al., "A Minimax Approach for Classification with Big-Data," Proceedings of the 2018 IEEE International Conference on Big Data (2018, Seattle, WA), pp. 1437 - 1444, Institute of Electrical and Electronics Engineers (IEEE), Dec 2018.
The definitive version is available at https://doi.org/10.1109/BigData.2018.8622564
2018 IEEE International Conference on Big Data, Big Data 2018 (2018: Dec. 10-13, Seattle, WA)
Electrical and Computer Engineering
Mathematics and Statistics
Intelligent Systems Center
Second Research Center/Lab
Center for High Performance Computing Research
Keywords and Phrases
Big data; Deep neural networks; Metadata; Affine transformations; Data classification; Error-driven learning; Generalization Error; Minimax approach; Novel methodology; Real applications; Simulation studies; Classification (of information)
International Standard Book Number (ISBN)
Article - Conference proceedings
© 2018 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
13 Dec 2018