Abstract
Validation is one of the most important aspects of clustering, particularly when the user is designing a trustworthy or explainable system. However, most clustering validation approaches require batch calculation. This is an important gap because of the value of clustering in real-time data streaming and other online learning applications. Therefore, interest has grown in providing online alternatives for validation. This paper extends the incremental cluster validity index (iCVI) family by presenting incremental versions of Calinski-Harabasz (iCH), Pakhira-Bandyopadhyay-Maulik (iPBM), WB index (iWB), Silhouette (iSIL), Negentropy Increment (iNI), Representative Cross Information Potential (irCIP), Representative Cross Entropy (irH), and Conn_Index (iConn_Index). This paper also provides a thorough comparative study of correct, under- and over-partitioning on the behavior of these iCVIs, the Partition Separation (PS) index as well as four recently introduced iCVIs: incremental Xie-Beni (iXB), incremental Davies-Bouldin (iDB), and incremental generalized Dunn's indices 43 and 53 (iGD43 and iGD53). Experiments were carried out using a framework that was designed to be as agnostic as possible to the clustering algorithms. The results on synthetic benchmark data sets showed that while evidence of most under-partitioning cases could be inferred from the behaviors of the majority of these iCVIs, over-partitioning was found to be a more challenging problem, detected by fewer of them. Interestingly, over-partitioning, rather then under-partitioning, was more prominently detected on the real-world data experiments within this study. The expansion of iCVIs provides significant novel opportunities for assessing and interpreting the results of unsupervised lifelong learning in real-time, wherein samples cannot be reprocessed due to memory and/or application constraints.
Recommended Citation
L. E. Brito Da Silva et al., "Incremental Cluster Validity Indices for Online Learning of Hard Partitions: Extensions and Comparative Study," IEEE Access, vol. 8, pp. 22025 - 22047, Institute of Electrical and Electronics Engineers (IEEE), Jan 2020.
The definitive version is available at https://doi.org/10.1109/ACCESS.2020.2969849
Department(s)
Electrical and Computer Engineering
Research Center/Lab(s)
Center for High Performance Computing Research
Keywords and Phrases
Adaptive Resonance Theory (ART); Clustering; Data Streams; Incremental (Online) Clustering Algorithms; Incremental Cluster Validity Index (ICVI); Validation
International Standard Serial Number (ISSN)
2169-3536
Document Type
Article - Journal
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 2020 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
Publication Date
01 Jan 2020
Comments
This research was sponsored by the Missouri University of Science and Technology Mary K. Finley Endowment and Intelligent Systems Center; the Coordenacao de Aperfeiçoamento de Pessoal de NÃvel Superior-Brazil (CAPES)-Finance code BEX 13494/13-9; the U.S. Dept. of Education Graduate Assistance in Areas of National Need program; the Army Research Laboratory (ARL) and the Lifelong Learning Machines program from the DARPA/Microsystems Technology Office, and it was accomplished under Cooperative Agreement Number W911NF-18-2-0260.