Subsumption Reduces Dataset Dimensionality Without Decreasing Performance of a Machine Learning Classifier

Donald C. Wunsch, Missouri University of Science and TechnologyFollow
Daniel B. Hier, Missouri University of Science and TechnologyFollow

Abstract

When Features in a High Dimension Dataset Are Organized Hierarchically, There is an Inherent Opportunity to Reduce Dimensionality. Since More Specific Concepts Are Subsumed by More General Concepts, Subsumption Can Be Applied Successively to Reduce Dimensionality. We Tested Whether Sub-Sumption Could Reduce the Dimensionality of a Disease Dataset Without Impairing Classification Accuracy. We Started with a Dataset that Had 168 Neurological Patients, 14 Diagnoses, and 293 Unique Features. We Applied Subsumption Repeatedly to Create Eight Successively Smaller Datasets, Ranging from 293 Dimensions in the Largest Dataset to 11 Dimensions in the Smallest Dataset. We Tested a MLP Classifier on All Eight Datasets. Precision, Recall, Accuracy, and Validation Declined Only at the Lowest Dimensionality. Our Preliminary Results Suggest that When Features in a High Dimension Dataset Are Derived from a Hierarchical Ontology, Subsumption is a Viable Strategy to Reduce Dimensionality.Clinical Relevance - Datasets Derived from Electronic Health Records Are Often of High Dimensionality. If Features in the Dataset Are based on Concepts from a Hierarchical Ontology, Subsumption Can Reduce Dimensionality.

Recommended Citation

D. C. Wunsch and D. B. Hier, "Subsumption Reduces Dataset Dimensionality Without Decreasing Performance of a Machine Learning Classifier," Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 1618 - 1621, Institute of Electrical and Electronics Engineers, Jan 2021.

The definitive version is available at https://doi.org/10.1109/EMBC46164.2021.9629897

Department(s)

Chemistry

Second Department

Electrical and Computer Engineering

International Standard Book Number (ISBN)

978-172811179-7

International Standard Serial Number (ISSN)

1557-170X

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jan 2021

PubMed ID

34891595

Chemistry Faculty Research & Creative Works

Subsumption Reduces Dataset Dimensionality Without Decreasing Performance of a Machine Learning Classifier

Abstract

Recommended Citation

Department(s)

Second Department

International Standard Book Number (ISBN)

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

PubMed ID

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Chemistry Faculty Research & Creative Works

Subsumption Reduces Dataset Dimensionality Without Decreasing Performance of a Machine Learning Classifier

Author

Abstract

Recommended Citation

Department(s)

Second Department

International Standard Book Number (ISBN)

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

PubMed ID

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations