Unsupervised Author Disambiguation using Heterogeneous Graph Convolutional Network Embedding

Abstract

People share same names in real world. When a digital library user searches for an author name, he may see a mixture of publications by different authors who have the same name. Making distinctions between them is an important prerequisite to improve the quality of services and contents in digital libraries. The general task of author disambiguation is to associate publications which belong to an identical name or names with highly similar spellings to different people entities. In recent years, many researches have been conducted to solve this challenging task. However, some works rely heavily on external knowledge bases and manually annotated data. Some unsupervised learning based works require complex feature engineering. In this paper, we propose a novel and efficient author disambiguation framework which needs no labeled data. We first construct a publication heterogeneous network for each ambiguous name. Then, we use our proposed heterogeneous graph convolutional network embedding method that encodes both graph structure and node attribute information to learn publication representations. After that, we propose a graph enhanced clustering method for name disambiguation that can greatly accelerate the clustering process and need not require the number of distinct persons. Our framework can be continually retrained and applied on incremental disambiguation task when new publications are put in. Experimental results on two datasets show that our framework clearly performs better than several state-of-the-art methods for author disambiguation.

Meeting Name

2019 IEEE International Conference on Big Data, Big Data 2019 (2019: Dec. 9-12, Los Angeles, CA)

Department(s)

Computer Science

Comments

The research is supported by the National Key Research and Development Plan (2017YFC1601504), the Natural Science Foundation of China (61836013), the CNTC (China National Tobacco Corporation ) Science and Technology Major Project (110201901027(SJ-06)), and the Guangdong Provincial Key Laboratory of Biocomputing (2016B030301007).

Keywords and Phrases

Clustering; Graph Convolutional Network; Meta Path; Name Disambiguation; Network Embedding

International Standard Book Number (ISBN)

978-172810858-2

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2019 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

01 Dec 2019

Share

 
COinS