CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering

Abstract

Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been demonstrated an effective way to achieve better clustering results. Despite the previous success, existing multi-view graph clustering methods usually assume that different views are available for the same set of instances. Thus, instances in different domains can be treated as having strict one-to-one relationship. In many real-life applications, however, data instances in one domain may correspond to multiple instances in another domain. Moreover, relationships between instances in different domains may be associated with weights based on prior (partial) knowledge. In this article, we propose a flexible and robust framework, Co-regularized Graph Clustering (CGC), based on non-negative matrix factorization (NMF), to tackle these challenges. CGC has several advantages over the existing methods. First, it supports many-to-many cross-domain instance relationship. Second, it incorporates weight on cross-domain relationship. Third, it allows partial cross-domain mapping so that graphs in different domains may have different sizes. Finally, it provides users with the extent to which the cross-domain instance relationship violates the in-domain clustering structure, and thus enables users to re-evaluate the consistency of the relationship. We develop an efficient optimization method that guarantees to find the global optimal solution with a given confidence requirement. The proposed method can automatically identify noisy domains and assign smaller weights to them. This helps to obtain optimal graph partition for the focused domain. Extensive experimental results on UCI benchmark datasets, newsgroup datasets, and biological interaction networks demonstrate the effectiveness of our approach.

Department(s)

Computer Science

Research Center/Lab(s)

Intelligent Systems Center

Comments

This work is supported by the National Science Foundation, under grant IIS-1313606, CAREER, IIS-1162374, IIS-1218036, and National Institutes of Health, under U01HG008488-01 and R01GM115833-01.

Keywords and Phrases

Co-Regularization; Graph Clustering; Nonnegative Matrix Factorization; Clustering Algorithms; Cobalt Compounds; Factorization

International Standard Serial Number (ISSN)

1556-4681

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2016 Association for Computing Machinery (ACM), All rights reserved.

Publication Date

01 Jul 2016

Share

 
COinS