Comparative Analysis Of Feature Selection Methods To Identify Biomarkers In A Stroke-Related Dataset
Abstract
This paper applies machine learning feature selection techniques to the REGARDS stroke-related dataset to identify health-related biomarkers. A data-driven methodological framework is presented to evaluate multiple feature selection methods. In applying the framework, three classifiers are chosen in conjunction with two wrappers, and their performance with diverse classification targets such as Current Smoker, Current Alcohol Use, and Deceased is evaluated. The performance across logistic regression, random forest and naïve Bayes classifier methods, as quantified by the ROC Area Under Curve metric and selected features, was similar. However, significant differences were observed in running time. Performance of the selected features was also evaluated based on the accuracy of a prediction model generated using a multi-layer perceptron (MLP) classifier.
Recommended Citation
T. Clifford et al., "Comparative Analysis Of Feature Selection Methods To Identify Biomarkers In A Stroke-Related Dataset," 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2019, article no. 8791457, Institute of Electrical and Electronics Engineers, Jul 2019.
The definitive version is available at https://doi.org/10.1109/CIBCB.2019.8791457
Department(s)
Electrical and Computer Engineering
Keywords and Phrases
classification; feature selection; machine learning
International Standard Book Number (ISBN)
978-172811462-0
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Jul 2019