|
|
Method for Author Name Disambiguation in Specific Research Tasks |
Wu Keye1,2, Min Chao1,2, Sun Jianjun1,2, Quan Zhaoxuan1,2 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.Institute of Data Research in Humanities and Social Sciences, Nanjing University, Nanjing 210023 |
|
|
Abstract Author name disambiguation is usually required in analyzing the flow of talents and evaluating scholars in academic works. This paper proposes an accurate and convenient method for name disambiguation for a specific research task. In order to simplify calculations and account for the lack of local data, this paper constructs a two-stage name disambiguation framework based on heterogeneous data. The first stage involves fully mining the local associated data, and the second stage combines the authoritative external data. Based on representation, relevant information extraction, relational network construction, semi-fuzzy retrieval, and other steps are carried out to achieve comprehensive and objective name disambiguation. Finally, the superiority of this method is identified and verified through thesis data under the field of artificial intelligence. Compared with manually annotated data, the framework performs better in disambiguation, and solves the problem of synonyms and namesakes in the original data, thus laying a solid foundation for subsequent research tasks.
|
Received: 01 June 2020
|
|
|
|
1 Miguelez E, Noumedem Temgoua C. Inventor migration and knowledge flows: a two-way communication channel?[J]. Research Policy, 2020, 49(9): 103914. 2 Park J, Wood I B, Jing E, et al. Global labor flow network reveals the hierarchical organization and dynamics of geo-industrial clusters[J]. Nature Communications, 2019, 10(1): 3449. 3 魏春丽, 赵镇岳, 艾文华, 等. 科研人员的流动模式及其影响因素研究[J]. 图书情报知识, 2020(2): 16-23. 4 刘玮辰, 郭俊华, 史冬波. 科学家跨国流动促进了知识扩散吗?——基于青年千人的实证分析[J]. 图书情报知识, 2020(2): 32-41. 5 Shin D, Kim T, Choi J, et al. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information[J]. Scientometrics, 2014, 100(1): 15-50. 6 林翠萍, 吴扬扬. 采用改进最长公共子序列的人名消歧[J]. 华侨大学学报(自然科学版), 2016, 37(2): 201-206. 7 Han X, Zhao J. Structural semantic relatedness: a knowledge-based method to named entity disambiguation[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2010: 50-59. 8 Jia B J, Yang H, Wu B, et al. Collective entity disambiguation based on hierarchical semantic similarity[J]. International Journal of Data Warehousing and Mining, 2020, 16(2): 1-17. 9 章顺瑞, 游宏梁. 基于层次聚类算法的中文人名消歧[J]. 现代图书情报技术, 2010(11): 64-68. 10 阳怡林, 周杰, 李弼程, 等. 基于分步聚类的人名消歧算法[J]. 数据采集与处理, 2016, 31(1): 213-222. 11 阳怡林, 周杰, 李弼程. 基于聚类集成的人名消歧算法[J]. 计算机应用研究, 2016, 33(9): 2716-2720. 12 Kim J. A fast and integrative algorithm for clustering performance evaluation in author name disambiguation[J]. Scientometrics, 2019, 120(2): 661-681. 13 Saha T K, Zhang B C, Al Hasan M. Name disambiguation from link data in a collaboration graph using temporal and topological features[J]. Social Network Analysis and Mining, 2015, 5(1): 11. 14 Zhang Y, Zhang F, Yao P, et al. Name disambiguation in Aminer: clustering, maintenance, and human in the loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2018: 1002-1011. 15 Tan H B, Tian Y F, Wang L F, et al. Name disambiguation using meta clusters and clustering ensemble[J]. Journal of Intelligent & Fuzzy Systems, 2020, 38(2): 1559-1568. 16 On B W, Lee I, Lee D. Scalable clustering methods for the name disambiguation problem[J]. Knowledge and Information Systems, 2012, 31(1): 129-151. 17 Sanyal D K, Bhowmick P K, Das P P. A review of author name disambiguation techniques for the PubMed bibliographic database[J]. Journal of Information Science, 2021, 47(2): 227-254. 18 Emami H. A graph-based approach to person name disambiguation in Web[J]. ACM Transactions on Management Information Systems, 2019, 10(2): 1-25. 19 Louppe G, Al-Natsheh H T, Susik M, et al. Ethnicity sensitive author disambiguation using semi-supervised learning[C]// International Conference on Knowledge Engineering and the Semantic Web. Cham: Springer, 2016: 272-287. 20 Yin D, Motohashi K, Dang J. Large-scale name disambiguation of Chinese patent inventors (1985-2016)[J]. Scientometrics, 2020, 122(2): 765-790. 21 尚玉玲, 曹建军, 李红梅, 等. 基于合作作者与隶属机构信息的同名排歧方法[J], 计算机科学, 2018, 45(11): 220-225, 260. 22 Zhang B, Al Hasan M. Name disambiguation in anonymized graphs using network embedding[C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York: ACM Press, 2017: 1239-1248. 23 周杰, 李弼程, 唐永旺. 基于关键证据与E2LSH的增量式人名聚类消歧方法[J]. 情报学报, 2016, 35(7): 714-722. 24 翟晓瑞, 韩红旗, 张运良, 等. 基于稀疏分布式表征的英文著者姓名消歧研究[J]. 计算机应用研究, 2019, 36(12): 3534-3538. 25 余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(Z1): 48-59. 26 Fan X, Wang J, Pu X, et al. On graph-based name disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): 1-23. 27 Tang J, Fong A C M, Wang B, et al. A unified probabilistic framework for name disambiguation in digital library[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(6): 975-987. 28 邓可君, 华凯, 邓昌明, 等. 基于机器学习的论文作者名消歧方法研究[J]. 四川大学学报(自然科学版), 2019, 56(2): 241-245. 29 刘斌, 赵升, 孙笑明, 等. 我国专利数据中发明家姓名消歧算法研究[J]. 情报学报, 2016, 35(4): 405-414. 30 张旺强, 祝忠明, 李雅梅, 等. 机构知识库作者名自动消歧框架设计与实践[J]. 现代图书情报技术, 2019, 3(6): 92-98. 31 王道仁, 杨冠灿, 傅俊英. 专利发明人英文重名识别判据及效度比较分析[J]. 数字图书馆论坛, 2016(8): 2-9. 32 Zhu J, Yang Y, Xie Q, et al. Robust hybrid name disambiguation framework for large databases[J]. Scientometrics, 2014, 98(3): 2255-2274. 33 孙笑明, 李瑶, 王成军, 等. 基于专家研讨思想的发明人姓名消歧研究[J]. 情报科学, 2019, 37(4): 116-121. 34 柯昊, 李天, 周悦, 等. 数据缺失时基于BP神经网络的作者重名辨识研究[J]. 情报学报, 2018, 37(6): 600-609. |
|
|
|