Abstract

Dimension reduction methods are used to visualize the output of unsupervised learning models when applied to complex data. These techniques improve interpretability by transforming a high-dimension space to a lower-dimension space (usually 2D or 3D). The results are typically viewed as 2D scatter plots, and class centroids may be added to increase interpretability. Although useful, the relationship of these class centroids to the underlying feature space remains opaque. The innovative aspect of this work is to create a strong link between the dimension-reduced space and the underlying high-dimension feature space by adding selected feature centroids to the 2D scatter plots. This approach simultaneously visualizes the centers for the classes and the features on the same 2D scatter plot. Since classes are often imbalanced, we provide a method to balance class sizes. We present an automated framework that performs a grid search to find the optimal dimension reduction parameters, balances the class sizes, uses an ensemble approach to find the most important features, and adds class centroids and selected feature centroids to 2D dimension-reduced plots. This is especially useful when applied to complex, feature-rich biomedical data, as addition of feature centroids to 2D scatter plots serve as landmarks for the previously featureless dimension-reduced space. The utility of this approach is demonstrated by its application to seven classes of neurogenetic diseases with 31 defining phenotypic features.

Department(s)

Electrical and Computer Engineering

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Jan 2024

Share

 
COinS