|
|
Research on Author Name Collaborative Disambiguation Based on Meta-path |
Yang Zhao |
Shanghai Jiao Tong University Library, Shanghai 200240 |
|
|
Abstract In the era of big data, facing the data governance of literature networks and accurately awarding academic achievements requires approaching the dual challenges of the diversity and ambiguity in author names. This paper proposes a meta-path method of author name disambiguation from a heterogeneous co-occurrence network perspective to solve the collaborative disambiguation problem, which arises from the coexistence of aliases, renaming, and the synonymous translations of author and institution names. The author name disambiguation problem is transformed into a heterogeneous network mining problem using the collaborative strategy of author and institution names. The author name disambiguation framework is constructed based on the meta-path. The heterogeneous author co-occurrence network model was constructed by combining the semantic and spatial association between objects. The variation aggregation, name disambiguation, and institution normalization methods based on the meta-path were proposed to calculate the similarity and perplexity of author and institution names. The effectiveness of this method was experimentally verified using the data set of English papers under the same institution and the data set of Chinese papers under the same name as examples.
|
Received: 13 August 2020
|
|
|
|
1 张美琦, 刘斐, 姚兰, 等. 查收查引质量控制关键环节——错引判断实践及其效果评估[J]. 大学图书馆学报, 2018, 36(5): 93-100. 2 杨昭, 任娟. 中文文献题录数据机构名称归一化研究[J]. 图书情报工作, 2020, 64(4): 95-102. 3 贾君枝, 曾建勋, 李捷佳, 等. 科研机构名称归一化实现[J]. 图书情报工作, 2018, 62(13): 103-110. 4 郭舒. 文献数据库中作者名自动化消歧方法应用研究[J]. 情报杂志, 2013, 32(9): 132-137. 5 庞云黠. 属性与关系的再认识——社会网络分析研究现状与演进[J]. 新闻与传播评论, 2019, 72(3): 117-128. 6 孙艺洲, 韩家炜. 异构信息网络挖掘: 原理和方法[M]. 段磊, 朱敏, 唐常杰, 译. 北京: 机械工业出版社, 2017: 148-151. 7 Han H, Giles L, Zha H Y, et al. Two supervised learning approaches for name disambiguation in author citations[C]// Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM Press, 2004: 296-305. 8 范午攸. 一种针对已知作者的姓名消歧方法[J]. 图书馆杂志, 2018, 37(12): 56-63. 9 曾健荣, 张仰森, 王思远, 等. 基于多特征融合的同名专家消歧方法研究[J]. 北京大学学报(自然科学版), 2020, 56(4): 607-613. 10 Zhang D, Tang J, Li J Z, et al. A constraint-based probabilistic framework for name disambiguation[C]// Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. New York: ACM Press, 2007: 1019-1022. 11 Han H Q, Yao C Q, Fu Y, et al. Semantic fingerprints-based author name disambiguation in Chinese documents[J]. Scientometrics, 2017, 111(3): 1879-1896. 12 Zhou Y, LI T, Liu L Y, et al. Research on the classification rules of database indexes in author name disambiguation[C]// Proceedings of the 2018 5th International Conference on Management Science and Management Innovation. Dordrecht: Atlantis Press, 2018, 54: 180-184. 13 柯昊, 李天, 周悦, 等. 数据缺失时基于BP神经网络的作者重名辨识研究[J]. 情报学报, 2018, 37(6): 600-609. 14 常娥. 学者身份识别的机制及关键技术研究[J]. 图书馆论坛, 2015, 35(10): 88-95. 15 刘巍, 祝忠明, 张旺强, 等. 机构知识库中作者标识与作品认领机制的研究与实现[J]. 现代图书情报技术, 2014(3): 8-13. 16 张建勇, 黄永文, 于倩倩, 等. 中国ORCID注册平台iAuthor的设计与实现[J]. 现代图书情报技术, 2015(3): 84-91. 17 张雪蕾, 魏青山, 陈雅迪. 基于甄别算法的学者学术成果认领机制的研究与实践[J]. 情报理论与实践, 2018, 41(2): 68-72. 18 黄国彬, 郑琳. 科研人员唯一标识符的组成与应用研究[J]. 图书情报工作, 2015, 59(4): 25-31. 19 Bhattacharya I, Getoor L. Collective entity resolution in relational data[J]. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1): Article 5. 20 吴斌, 徐超群, 王文彬, 等. 基于链接的作者重名处理方法研究与应用[J]. 计算机科学, 2008, 35(3): 197-199. 21 Fan X M, Wang J Y, Pu X, et al. On graph-based name disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): Article No.10. 22 Shin D, Kim T, Choi J, et al. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information[J]. Scientometrics, 2014, 100(1): 15-50. 23 尚玉玲, 曹建军, 李红梅, 等. 基于合作作者与隶属机构信息的同名排歧方法[J]. 计算机科学, 2018, 45(11): 220-225, 260. 24 Ma X, Wang R R, Zhang Y. Author name disambiguation in heterogeneous academic networks[C]// Proceedings of the International Conference on Web Information Systems and Applications. Cham: Springer, 2019: 126-137. 25 Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised author disambiguation using heterogeneous graph convolutional network embedding[C]// Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 2019: 910-919. 26 余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59. 27 王建霞, 张玉璇, 许云峰. 基于元路径异构网络嵌入的姓名实体消歧方法[J]. 河北科技大学学报, 2020, 41(3): 233-241. 28 杨昭. 基于元路径的机构名称归一化研究[J]. 情报学报, 2020, 39(10): 1069-1080. 29 Popescu O. Person cross document coreference with name perplexity estimates[C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2009: 997-1006. |
|
|
|