An Information-Theoretic-Cluster Visualization for Self-Organizing Maps

Abstract

Improved data visualization will be a significant tool to enhance cluster analysis. In this paper, an information-theoretic-based method for cluster visualization using self-organizing maps (SOMs) is presented. The information-theoretic visualization (IT-vis) has the same structure as the unified distance matrix, but instead of depicting Euclidean distances between adjacent neurons, it displays the similarity between the distributions associated with adjacent neurons. Each SOM neuron has an associated subset of the data set whose cardinality controls the granularity of the IT-vis and with which the first- and second-order statistics are computed and used to estimate their probability density functions. These are used to calculate the similarity measure, based on Renyi's quadratic cross entropy and cross information potential (CIP). The introduced visualizations combine the low computational cost and kernel estimation properties of the representative CIP and the data structure representation of a single-linkage-based grouping algorithm to generate an enhanced SOM-based visualization. The visual quality of the IT-vis is assessed by comparing it with other visualization methods for several real-world and synthetic benchmark data sets. Thus, this paper also contains a significant literature survey. The experiments demonstrate the IT-vis cluster revealing capabilities, in which cluster boundaries are sharply captured. Additionally, the information-theoretic visualizations are used to perform clustering of the SOM. Compared with other methods, IT-vis of large SOMs yielded the best results in this paper, for which the quality of the final partitions was evaluated using external validity indices.

Department(s)

Electrical and Computer Engineering

Research Center/Lab(s)

Intelligent Systems Center

Second Research Center/Lab

Center for High Performance Computing Research

Keywords and Phrases

Benchmarking; Cluster analysis; Conformal mapping; Data visualization; Information theory; Neurons; Probability density function; Quality control; Visualization; Cluster visualization; Computational costs; External validities; Information potential; Second order statistics; Self organizing maps(SOMs); Synthetic benchmark; Visualization method; Self organizing maps; Clustering; Entropy; Review; Self-organizing feature maps; Survey

International Standard Serial Number (ISSN)

2162-237X; 2162-2388

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2018 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

01 Jun 2018

Share

 
COinS