In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numerical features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.
T. Nguyen and T. Obafemi-Ajayi, "Structured Iterative Hard Thresholding For Categorical And Mixed Data Types," 2019 IEEE Symposium Series on Computational Intelligence, SSCI 2019, pp. 2541 - 2547, article no. 9002948, Institute of Electrical and Electronics Engineers, Dec 2019.
The definitive version is available at https://doi.org/10.1109/SSCI44817.2019.9002948
Electrical and Computer Engineering
Keywords and Phrases
categorical data types; sparse linear model; thresholding; feature selection.
International Standard Book Number (ISBN)
Article - Conference proceedings
© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.
01 Dec 2019