Abstract

This paper applies machine learning feature selection techniques to the REGARDS stroke-related dataset to identify health-related biomarkers. A data-driven methodological framework is presented to evaluate multiple feature selection methods. In applying the framework, three classifiers are chosen in conjunction with two wrappers, and their performance with diverse classification targets such as Current Smoker, Current Alcohol Use, and Deceased is evaluated. The performance across logistic regression, random forest and naïve Bayes classifier methods, as quantified by the ROC Area Under Curve metric and selected features, was similar. However, significant differences were observed in running time. Performance of the selected features was also evaluated based on the accuracy of a prediction model generated using a multi-layer perceptron (MLP) classifier.

Department(s)

Electrical and Computer Engineering

Keywords and Phrases

classification; feature selection; machine learning

International Standard Book Number (ISBN)

978-172811462-0

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Jul 2019

Share

 
COinS