Unsupervised Author Disambiguation using Heterogeneous Graph Convolutional Network Embedding
Abstract
People share same names in real world. When a digital library user searches for an author name, he may see a mixture of publications by different authors who have the same name. Making distinctions between them is an important prerequisite to improve the quality of services and contents in digital libraries. The general task of author disambiguation is to associate publications which belong to an identical name or names with highly similar spellings to different people entities. In recent years, many researches have been conducted to solve this challenging task. However, some works rely heavily on external knowledge bases and manually annotated data. Some unsupervised learning based works require complex feature engineering. In this paper, we propose a novel and efficient author disambiguation framework which needs no labeled data. We first construct a publication heterogeneous network for each ambiguous name. Then, we use our proposed heterogeneous graph convolutional network embedding method that encodes both graph structure and node attribute information to learn publication representations. After that, we propose a graph enhanced clustering method for name disambiguation that can greatly accelerate the clustering process and need not require the number of distinct persons. Our framework can be continually retrained and applied on incremental disambiguation task when new publications are put in. Experimental results on two datasets show that our framework clearly performs better than several state-of-the-art methods for author disambiguation.
Recommended Citation
Z. Qiao et al., "Unsupervised Author Disambiguation using Heterogeneous Graph Convolutional Network Embedding," Proceedings of the 2019 IEEE International Conference on Big Data (2019, Los Angeles, CA), pp. 910 - 919, Institute of Electrical and Electronics Engineers (IEEE), Dec 2019.
The definitive version is available at https://doi.org/10.1109/BigData47090.2019.9005458
Meeting Name
2019 IEEE International Conference on Big Data, Big Data 2019 (2019: Dec. 9-12, Los Angeles, CA)
Department(s)
Computer Science
Keywords and Phrases
Clustering; Graph Convolutional Network; Meta Path; Name Disambiguation; Network Embedding
International Standard Book Number (ISBN)
978-172810858-2
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2019 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
Publication Date
01 Dec 2019
Comments
The research is supported by the National Key Research and Development Plan (2017YFC1601504), the Natural Science Foundation of China (61836013), the CNTC (China National Tobacco Corporation ) Science and Technology Major Project (110201901027(SJ-06)), and the Guangdong Provincial Key Laboratory of Biocomputing (2016B030301007).