|
|
Meta-path-Based Research on Institution Name Normalization |
Yang Zhao |
Shanghai Jiao Tong University Library, Shanghai 200240 |
|
|
Abstract Facing data governance and name standardization in the big data environment and aiming at the diversity and complexity of institution name data, this paper attempts to use the co-occurrence perspective and the heterogeneous network mining method to explore the name normalization of data-driven institutions, which can improve the quality of document network construction, mining, and application. From the perspective of the co-occurrence institution identification method, a triple heterogeneous co-occurrence network model is constructed, which consists of a superior institution, an institution, and a subordinate institution. The normalization problem of the institution name is transformed into a heterogeneous co-occurrence network mining problem, and a meta-path-based framework model of institution name normalization is constructed. Topological features and recognition tools based on meta-path are systematically designed to identify any hidden semantic relationships by mining the text attributes, geographic attributes, and relationship attributes of the heterogeneous co-occurrence networks. Using the name normalization of WoS (Web of Science) bibliographic data institutions in Shanghai Jiaotong University from 2008 to 2018 as an example, the experimental results verify the effectiveness of the method.
|
Received: 06 September 2019
|
|
|
|
1 贾君枝, 曾建勋, 李捷佳, 等. 科研机构名称归一化实现[J]. 图书情报工作, 2018, 62(13): 103-110. 2 张建勇, 钱力, 于倩倩, 等. 科研实体名称规范的研究与实践[J]. 数据分析与知识发现, 2019, 3(1): 27-37. 3 赵星. 信息网络关键节点对之删除判定[J]. 中国图书馆学报, 2018, 44(5): 47-58. 4 Leydesdorff L. What can heterogeneity add to the scientometric map? Steps towards algorithmic historiography[M]// Débordements. Paris: Presses des Mines, 2010: 283-289. 5 孙艺洲, 韩家炜. 异构信息网络挖掘: 原理和方法[M]. 段磊, 朱敏, 唐常杰, 译. 北京: 机械工业出版社, 2017: 3-7. 6 French J C, Powell A L, Schulman E, et al. Automating the construction of authority files in digital libraries: A case study[C]// Proceedings of International Conference on Theory and Practice of Digital Libraries. Heidelberg: Springer, 1997: 55-71. 7 Yu W, Yesupriya A, Wulf A, et al. An automatic method to generate domain-specific investigator networks using PubMed abstracts[J]. BMC Medical Informatics and Decision Making, 2007, 7: 17. 8 Guo H L, Zhu H J, Guo Z L, et al. Address standardization with latent semantic association[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2009, 6: 1155-1163. 9 Jonnalagadda S R, Topham P. NEMO: Extraction and normalization of organization names from PubMed affiliations[J]. Journal of Biomedical Discovery and Collaboration, 2010, 5: 50-75. 10 张晋辉, 刘清. 基于推理机的SCI地址字段数据清洗方法设计[J]. 情报科学, 2010, 28(5): 741-746. 11 Jiang Y, Zheng H T, Wang X M, et al. Affiliation disambiguation for constructing semantic digital libraries[J]. Journal of the American Society for Information Science and Technology, 2011, 62(6): 1029-1041. 12 Morillo F, Aparicio J, González-Albo B, et al. Towards the automation of address identification[J]. Scientometrics, 2013, 94(1): 207-224. 13 Cuxac P, Lamirel J C, Bonvallot V. Efficient supervised and semi-supervised approaches for affiliations disambiguation[J]. Scientometrics, 2013, 97(1): 47-58. 14 杨波, 杨军威, 阎素兰. 基于规则的机构名规范化研究[J]. 现代图书情报技术, 2015(6): 57-63. 15 杨瑞仙, 毛一雷. 面向知识评价的我国科研机构命名识别方法研究[J]. 情报杂志, 2015, 34(7): 179-183. 16 孙海霞, 王蕾, 吴英杰, 等. 科技文献数据库中机构名称匹配策略研究[J]. 数据分析与知识发现, 2018, 2(8): 88-97. 17 何涛, 王桂芳, 马廷灿. 基于类中心向量的论文作者归属机构自动识别方法研究[J]. 情报学报, 2019, 38(7): 716-721. 18 叶鹰. 图书情报学的学术思想与技术方法及其开新[J]. 中国图书馆学报, 2019, 45(2): 15-25. 19 余传明, 周丹. 情感词汇共现网络的复杂网络特性分析[J]. 情报学报, 2010, 29(5): 906-914. |
|
|
|