Abstract

In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numerical features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.

Department(s)

Electrical and Computer Engineering

Keywords and Phrases

categorical data types; sparse linear model; thresholding; feature selection.

International Standard Book Number (ISBN)

978-172812485-8

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Dec 2019

Share

 
COinS