Statistical Comparative Analysis and Evaluation of Validation Indices for Clustering Optimization


Clustering is a relevant exploratory tool for a broad range of machine learning applications as it aids identification of meaningful subgroups. For a given clustering algorithm, multiple partitions can be obtained on the same data set by varying algorithmic parameters. Internal validation indices provide a means to objectively evaluate how well groupings obtained from a clustering configuration partitions the data, since there is no prior labeled data. This work presents a rigorous statistical evaluation framework that analyzes performance of internal validation indices based on correlation with external indices. A synthetic data generator that captures a wide range of complexity is proposed. Evaluation is conducted on a varied set of synthetic data types and real data sets to investigate performance of the indices.

Meeting Name

2020 IEEE Symposium Series on Computational Intelligence, SSCI (2020: Dec. 1-4, Canberra, ACT, Australia)


Mathematics and Statistics

Research Center/Lab(s)

Center for High Performance Computing Research

Keywords and Phrases

clustering; statistical analysis; validation indices

International Standard Book Number (ISBN)


Document Type

Article - Conference proceedings

Document Version


File Type





© 2020 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

04 Dec 2020