|
|
Author Name Disambiguation Using BP Neural Networks under Missing Data |
Ke Hao1, Li Tian1, Zhou Yue1, Zhong Yuying1, Yu Zhenglu2, Yuan Junpeng3, 4 |
1. Central University of Finance and Economics, Beijing 100081; 2. Institute of Scientific and Technical Information of China, Beijing 100038; 3. National Science Library, Chinese Academy of Sciences, Beijing 100190; 4. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190 |
|
|
Abstract Author name disambiguation plays an important role in individual-based information science, knowledge management, bibliometrics, and scientometrics. Machine learning algorithms such as clustering and classification are often employed for author name disambiguation. However, inadequate information and missing field data in metadata of publications can lead to algorithm failure. To address the said issue, this study focuses on the accurate determination of the effect of various fields in author name disambiguation under missing data. Furthermore, the proposed study analyzes existing information for determining the contributory role of each field in author name disambiguation. A unique combination of indicators was designed, and an algorithm for identifying author name repetition by incorporating BP (back propagation) neural networks was developed. The validity and reliability of the proposed algorithm was verified by employing the paper written by “Wang Wei”.
|
Received: 04 December 2017
|
|
|
|
[1] 袁军鹏, 俞征鹿, 苏成, 等. 作者重名辨识研究进展[J]. 数字图书馆论坛, 2011(10): 60-65. [2] 人民网-人民日报.中国国际论文被引用数排名上升至全球第二位[OL]. (2017-11-01) [2017-11-07]. http://edu.people.com.cn/GB/ n1/2017/1101/c1053-29620130.html. [3] 搜狐. 中国重名最多的姓名排行榜[OL]. (2016-04-051) [2017-11-07]. http://www.sohu.com/a/67609935_348928. [4] Shiffrin R M, Börner K.Mapping knowledge domains[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(suppl 1): 5183-5185. [5] Han H, Giles L, Zha H Y, et al.Two supervised learning approaches for name disambiguation in author citations[C]// Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM Press, 2004: 296-305. [6] 吴斌, 徐超群, 王文彬, 等. 基于链接的作者重名处理方法研究与应用[J]. 计算机科学, 2008, 35(3): 197-199. [7] Sun X L, Kaur J, Possamai L, et al.Ambiguous author query detection using crowdsourced digital library annotations[J]. Information Processing & Management, 2013, 49(2): 454-464. [8] Youtie J, Carley S, Porter A L, et al.Tracking researchers and their outputs: new insights from ORCIDs[J]. Scientometrics, 2017, 113(1): 437-453. [9] 李琦. 基于层次聚类和网页关系的人名消歧[D]. 济南: 山东大学, 2012. [10] Muller M C, Reitz F, Roy N.Data sets for author name disambiguation: an empirical analysis and a new resource[J]. Scientometrics, 2017, 111(3): 1467-1500. 李刚, 史向东. 基于Google搜索结果的重名消解方法[J]. 信息与电脑(理论版), 2011(2): 125-126, 128. [11] Abdulhayoglu M A, Thijs B.Use of ResearchGate and Google CSE for author name disambiguation[J]. Scientometrics, 2017, 111(3): 1965-1985. [12] Little R J, Rubin D B.Statistical analysis with missing data[M]. (2nd edn). New York: John Wiley & Sons, 2002. [13] 李瑜, 王俊, 李锦, 等. 探测数据缺失情况下心率变异性信号的复杂性[J]. 中国科学: 信息科学, 2015, 45(8): 1015-1024. [14] Perepu S K, Tangirala A K.Reconstruction of missing data using compressed sensing techniques with adaptive dictionary[J]. Journal of Process Control, 2016, 47: 175-190. [15] 张松兰, 王鹏, 徐子伟. 基于统计相关的缺失值数据处理研究[J]. 统计与决策, 2016(12): 13-16 [16] Smith J A, Moody J, Morgan J H.Network sampling coverage II: The effect of non-random missing data on network measurement[J]. Social Networks, 2017, 48: 78-99. [17] 宋红凤, 汤杨冰, 徐登可. 缺失数据下非线性均值方差模型的参数估计[J]. 统计与决策, 2017(19): 10-14. [18] 张晓琴, 程誉莹. 基于随机森林模型的成分数据缺失值填补法[J]. 应用概率统计, 2017, 33(1): 102-110. [19] Al-Dabbous A N, Kumar P, Khan A R. Prediction of airborne nanoparticles at roadside location using a feed-forward artificial neural network[J]. Atmospheric Pollution Research, 2016, 8(3): 446-454. [20] Bai Y, Li Y, Wang X X, et al.Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions[J]. Atmospheric Pollution Research, 2016, 7(3): 557-566. [21] 陈春玲, 陈红, 余瀚. 改进的BP算法对移动用户行为预测的研究[J]. 计算机技术与发展, 2018, 28(6), 出版中. [22] 安磊, 赵书良, 武永亮, 等. 基于recurrent neural networks的网约车供需预测方法[J]. 计算机应用研究, 2019, 36(4). [23] 肖进, 孙海燕, 刘敦虎, 等. 基于GMDH混合模型的能源消费量预测研究[J]. 中国管理科学, 2017(12): 158-166. [24] 杨淑娥, 黄礼. 基于BP神经网络的上市公司财务预警模型[J]. 系统工程理论与实践, 2005(1): 12-18, 26. |
|
|
|