|
|
Institution Name Normalization Based on Author and Research Theme |
Hu Qian, Wu Qian, Dong Hanyu, Li Jing |
School of Information Management, Central China Normal University, Wuhan 430079 |
|
|
Abstract The evolution of scientific research institution names caused by the development and changes of institutions has seriously affected the quality and effect of knowledge services based on institution names, such as in information retrieval and research evaluation. This study developed a method for institution name normalization based on the author and research theme so as to eliminate heterogeneity between the names of scientific research institutions and optimize information retrieval and knowledge discovery services based on institution names. By conducting a performance analysis of the name evolution of scientific research institutions in the signature of academic papers, this study constructed a recognition model of the name evolution relationship of scientific research institutions based on the author and research theme and identified the renaming, splitting, merging, and restructuring relationships between the names of scientific research institutions. The model was then verified using small-scale academic paper data. The experimental results indicate that the proposed method achieves a better accuracy and recall rate in identifying the name evolution relationship of primary and secondary institutions and can also identify name evolution relationships between unpopular institutions.
|
Received: 29 December 2022
|
|
|
|
1 叶光辉, 彭泽, 陈国梁, 等. 学术文献中的知识单元抽取及其分布特征识别研究[J]. 情报理论与实践, 2023, 46(4): 90-98. 2 曾建勋, 贾君枝. 机构名称规范数据的语义模型构建[J]. 大学图书馆学报, 2019, 37(1): 42-47. 3 吕冬晴, 陆红如, 成颖, 等. 基于机构-作者向量的科研机构名称演化识别方法研究[J]. 情报学报, 2020, 39(6): 565-578. 4 French J C, Powell A L, Schulman E, et al. Automating the construction of authority files in digital libraries: a case study[C]// Proceedings of the International Conference on Theory and Practice of Digital Libraries. Cham: Springer, 1997: 55-71. 5 French J C, Powell A L, Schulman E. Using clustering strategies for creating authority files[J]. Journal of the American Society for Information Science and Technology, 2000, 51(8): 774-786. 6 黄林晟, 邓志鸿, 唐世渭, 等. 基于编辑距离的中文组织机构名简称-全称匹配算法[J]. 山东大学学报(理学版), 2012, 47(5): 43-48. 7 杨波, 杨军威, 阎素兰. 基于规则的机构名规范化研究[J]. 现代图书情报技术, 2015(6): 57-63. 8 Caron E, Daniels H. Identification of organization name variants in large databases using rule-based scoring and clustering: with a case study on the web of science database[C]// Proceedings of the 18th International Conference on Enterprise Information Systems. Setúbal: SciTePress, 2016: 182-187. 9 孙海霞, 王蕾, 吴英杰, 等. 科技文献数据库中机构名称匹配策略研究[J]. 数据分析与知识发现, 2018, 2(8): 88-97. 10 王锦华, 陈锐, 冯占英, 等. 基于多源数据融合的军事医学机构名称规范研究[J]. 中华医学图书情报杂志, 2020, 29(2): 52-57. 11 高曼, 刘扬, 姚克宇, 等. 基于规则和机构词表的中医药机构名称规范化研究与应用[J]. 国际中医中药杂志, 2022, 44(6): 685-689. 12 Balsmeier B, Assaf M, Chesebro T, et al. Machine learning and natural language processing on the patent corpus: data, tools, and new measures[J]. Journal of Economics & Management Strategy, 2018, 27(3): 535-553. 13 孙海霞, 李军莲, 吴英杰. 基于k-means的机构归一化研究[J]. 医学信息学杂志, 2013, 34(7): 41-44, 71. 14 Cuxac P, Lamirel J C, Bonvallot V. Efficient supervised and semi-supervised approaches for affiliations disambiguation[J]. Scientometrics, 2013, 97(1): 47-58. 15 Jonnalagadda S R, Topham P. NEMO: extraction and normalization of organization names from PubMed affiliation strings[J]. Journal of Biomedical Discovery and Collaboration, 2010, 5: 50-75. 16 孙源. 基于Word2Vec的SCI地址字段数据清洗方法研究[J]. 情报杂志, 2019, 38(2): 195-200. 17 张建勇, 钱力, 于倩倩, 等. 科研实体名称规范的研究与实践[J]. 数据分析与知识发现, 2019, 3(1): 27-37. 18 Raman N, Bang G, Nematzadeh A. MultiGraph attention network for analyzing company relations[C]// Proceedings of the 2019 8th International Conference on Computing and Pattern Recognition. New York: ACM Press, 2019: 426-433. 19 Cetoli A, Akbari M, Bragaglia S, et al. Named entity disambiguation using deep learning on graphs[EB/OL]. (2018-10-22) [2022-10-11]. https://arxiv.org/pdf/1810.09164.pdf. 20 Jiang Y, Zheng H T, Wang X M, et al. Affiliation disambiguation for constructing semantic digital libraries[J]. Journal of the American Society for Information Science and Technology, 2011, 62(6): 1029-1041. 21 杨奕虹, 李雅萍, 张立丽, 等. 机构多层级词表的编制及在文献计量评价与科研绩效管理中的应用[J]. 数字图书馆论坛, 2013(6): 57-63. 22 Sun H X, Li J L, Wu Y J, et al. Using an ontology-based approach to handle author affiliations in a large biomedical citation database[J]. Studies in Health Technology and Informatics, 2017, 245: 1338. 23 杨昭, 任娟. 中文文献题录数据机构名称归一化研究[J]. 图书情报工作, 2020, 64(4): 95-102. 24 杨昭. 基于元路径的机构名称归一化研究[J]. 情报学报, 2020, 39(10): 1069-1080. 25 贾君枝, 曾建勋, 李捷佳, 等. 科研机构名称归一化实现[J]. 图书情报工作, 2018, 62(13): 103-110. 26 曾建勋, 郭红梅. 基于知识组织的机构规范文档构建方法研究[J]. 中国图书馆学报, 2021, 47(1): 61-75. 27 刘进, 沈红. 中国研究型大学教师流动: 频率、路径与类型[J]. 复旦教育论坛, 2014, 12(1): 42-48, 92. 28 Zehnalova S, Horak Z, Kudelka M, et al. Evolution of author’s topic in authorship network[C]// Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Piscataway: IEEE, 2012: 1207-1210. 29 龙存钰. 基于期刊论文的科研人员属性信息抽取及关系发现[D]. 武汉: 华中师范大学, 2022. |
|
|
|