|
|
A Method for Institution Name Normalization Based on Institution-Author Vectors |
Lyu Dongqing1, Lu Hongru1, Cheng Ying1,2, Sun Haixia1,3 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.School of Chinese Language and Literature, Shandong Normal University, Jinan 250014 3.Institution of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020 |
|
|
Abstract Institution transition is one reason behind variety in institution names. Normalization of institution names benefits both information retrieval recall and the reliability of bibliometric research results. Thus, this paper proposes a method for institution name normalization based on the stable feature of personnel in an academic institution in the short term. Specifically, institution-author and institution-annual vectors are constructed for each academic institution, and the similarity of the integrated institution-author vectors, the number of co-authors, and mapping rules are used to identify transition relationships between two institutions, including renaming, merger, split, and reorganization. The method was tested using data from the CSSCI database between 1999 and 2015. After controlling for the impact of personnel turnover and homonymous authors, the proposed method demonstrated excellent performance in both accuracy and recall.
|
Received: 06 June 2019
|
|
|
|
1 De Bruin R E, Mode H F. Delimitation of scientific subfields using cognitive words from corporate addresses in scientific publications[J]. Scientometrics, 1993, 26(1): 65-80. 2 French J C, Powell A L, Schulman E. Using clustering strategies for creating authority files[J]. Journal of the American Society for Information Science and Technology, 2000, 51(8): 774-786. 3 Galvez C, Moya-Anegón F. The unification of institutional addresses applying parametrized finite-state graphs (P-FSG)[J]. Scientometrics, 2006, 69(2): 323-345. 4 王星, 曾建勋, 苏静, 等. 机构规范文档构建方式研究[J]. 数字图书馆论坛, 2015(7): 2-8. 5 黄俊贵. 规范控制概说[J]. 高校图书馆工作, 1999, 19(3): 1-8. 6 Jonnalagadda S, Topham P. NEMO: Extraction and normalization of organization names from PubMed affiliation strings[J]. Journal of Biomedical Discovery and Collaboration, 2010, 5: 50-57. 7 贾君枝, 曾建勋, 李捷佳, 等. 科研机构名称归一化实现[J]. 图书情报工作, 2018, 62(13): 103-110. 8 Christen P, Belacic D. Automated probabilistic address standardization and verification[C]// Proceedings of the 4th Australasian Data Mining Conference, 2005. 9 Guo H L, Zhu H J, Guo Z L, et al. Address standardization with latent semantic association[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2009: 1155-1163. 10 孙海霞, 王蕾, 吴英杰, 等. 科技文献数据库中机构名称匹配策略研究[J]. 数据分析与知识发现, 2018, 2(8): 88-97. 11 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340. 12 曾建勋, 贾君枝. 机构名称规范数据的语义模型构建[J]. 大学图书馆学报, 2019, 37(1): 42-47. 13 Nguyen V H, Nguyen H T, Snasel V. Text normalization for named entity recognition in Vietnamese tweets[J]. Computational Social Networks, 2016, 3: 10. 14 Cuxac P, Lamirel J C, Bonvallot V. Efficient supervised and semi-supervised approaches for affiliations disambiguation[J]. Scientometrics, 2013, 97(1): 47-58. 15 孙海霞, 李军莲, 吴英杰. 基于K-means的机构归一化研究[J]. 医学信息学杂志, 2013, 34(7): 41-44, 71. 16 Onodera N, Iwasawa M, Midorikawa N, et al. A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search[J]. Journal of the American Society for Information Science and Technology, 2011, 62(4): 677-690. 17 Jiang Y, Zheng H T, Wang X M, et al. Affiliation disambiguation for constructing semantic digital libraries[J]. Journal of the American Society for Information Science and Technology, 2011, 62(6): 1029-1041. 18 Huang S Q, Yang B, Yan S L, et al. Institution name disambiguation for research assessment[J]. Scientometrics, 2014, 99(3): 823-838. 19 杨波, 杨军威, 阎素兰. 基于规则的机构名规范化研究[J]. 现代图书情报技术, 2015(6): 57-63. 20 刘进, 沈红. 中国研究型大学教师流动: 频率、路径与类型[J]. 复旦教育论坛, 2014, 12(1): 42-48, 92. 21 李江, 伍军红. 论文发表时滞与优先数字出版[J]. 编辑学报, 2011, 23(4): 357-359. |
|
|
|