结合规则模型与监督模型的两阶段作者姓名消歧研究

doi:10.3772/j.issn.1000-0135.2026.05.010

情报学报

2026, Vol. 45

Issue (5): 758-775 DOI: 10.3772/j.issn.1000-0135.2026.05.010

情报技术与应用

本期目录 | 过刊浏览 | 高级检索

结合规则模型与监督模型的两阶段作者姓名消歧研究

陈一帆^1,2, 谢瑞霞³, 杨宁^1,2, 胡威¹, 张志强^1,2

1.中国科学院成都文献情报中心，成都 610299
2.中国科学院大学经济与管理学院信息资源管理系，北京 100190
3.同济大学经济与管理学院，上海 200092

Two-Stage Author Name Disambiguation Study Combining Rule-Based and Supervised Models

Chen Yifan^1,2, Xie Ruixia³, Yang Ning^1,2, Hu Wei¹, Zhang Zhiqiang^1,2

1.National Science Library (Chengdu), Chinese Academy of Sciences, Chengdu 610299
2.Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190
3.School of Economics and Management, Tongji University, Shanghai 200092

摘要
图/表
参考文献
相关文章 (0)

全文: PDF (3868 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要作者姓名消歧（author name disambiguation，AND）是信息检索、信息整合、文献计量等领域开展学术研究的一项基础性与保障性工作。在单一消歧模型难以满足实际消歧需求的背景下，本文创新性地提出一种结合规则模型与监督模型的两阶段自动化作者姓名消歧框架TSD-RS（two-stage disambiguation framework for integrating rule-based and supervisory model）。一阶段采用动态阈值法对规则模型进行改良，以提升初步消歧性能，在此基础上，设计并比较12种规则使用顺序对AND的影响；二阶段以初步消歧形成的论文簇为节点、以监督模型预测结果为连边权重构建簇间网络，通过InfoMap算法对网络进行社团划分以实现二次迭代消歧，在此过程中，分别比较4种训练集（正负样本对）自动化构造方法及4种监督模型（包括大语言模型）用于AND的性能差异。在3个不同规模金标准数据集上的实验结果显示，当TSD-RS一阶段规则顺序选择Order5、二阶段训练集正样本提取方法选择1/2-shell、监督模型选择随机森林时，消歧效果最好且bF1值95%置信区间为0.85±0.04，较基线模型有明显提升。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈一帆
	谢瑞霞
	杨宁
	胡威
	张志强

关键词 ：两阶段姓名消歧, 规则模型, 监督模型, 动态阈值

收稿日期: 2025-05-15

基金资助:中国科学技术协会项目“世界科技前沿动态研究”（HT15022025447）。

作者简介: 陈一帆，1997年生，博士研究生，主要研究方向为领域知识结构探测、科学计量；谢瑞霞，1992年生，博士，主要研究方向为数据挖掘、科学计量；杨宁，1982年生，博士，正高级工程师，主要研究方向为信息组织与利用、知识挖掘与服务等；胡威，1997年生，博士，工程师，主要研究方向为复杂网络数据建模、专利计量分析；张志强，通信作者，1964年生，博士，研究员，博士生导师，主要研究方向为科技战略与规划、科技政策与管理、科学计量等，E-mail：zhangzq@clas.ac.cn；

引用本文:

陈一帆, 谢瑞霞, 杨宁, 胡威, 张志强. 结合规则模型与监督模型的两阶段作者姓名消歧研究[J]. 情报学报, 2026, 45(5): 758-775.
Chen Yifan, Xie Ruixia, Yang Ning, Hu Wei, Zhang Zhiqiang. Two-Stage Author Name Disambiguation Study Combining Rule-Based and Supervised Models. 情报学报, 2026, 45(5): 758-775.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2026.05.010 或 https://qbxb.istic.ac.cn/CN/Y2026/V45/I5/758

1 Kim J, Diesner J. Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks[J]. Journal of the Association for Information Science and Technology, 2016, 67(6): 1446-1461.
2 Santini C, Gesese G A, Peroni S, et al. A knowledge graph embeddings based approach for author name disambiguation using literals[J]. Scientometrics, 2022, 127(8): 4887-4912.
3 Enserink M. Are you ready to become a number?[J]. Science, 2009, 323(5922): 1662-1664.
4 Top names over the last 100 years[EB/OL]. [2025-05-14]. https://www.ssa.gov/oact/babynames/decades/century.html.
5 Web of Science. Search results from all databases for Wang Wei[EB/OL]. [2025-05-14]. https://www.webofscience.com/wos/alldb/summary/be0a4a57-d39e-4a78-87ff-5d4716e8209c-0109e39b1f/relevance/1.
6 Correia A, Guimar?es D, Paulino D, et al. AuthCrowd: author name disambiguation and entity matching using crowdsourcing[C]// Proceedings of the 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design. Piscataway: IEEE, 2021: 150-155.
7 Zhang L Z, Ban Z J. Author name disambiguation based on rule and graph model[C]// Proceedings of the 9th Natural Language Processing and Chinese Computing. Cham: Springer, 2020: 617-628.
8 Rehs A. A supervised machine learning approach to author disambiguation in the Web of Science[J]. Journal of Informetrics, 2021, 15(3): 101166.
9 Waqas H, Qadir M A. Multilayer heuristics based clustering framework (MHCF) for author name disambiguation[J]. Scientometrics, 2021, 126(9): 7637-7678.
10 Liu Y, Li W J, Huang Z, et al. A fast method based on multiple clustering for name disambiguation in bibliographic citations[J]. Journal of the Association for Information Science and Technology, 2015, 66(3): 634-644.
11 Backes T. Effective unsupervised author disambiguation with relative frequencies[C]// Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. New York: ACM Press, 2018: 203-212.
12 Levin M, Krawczyk S, Bethard S, et al. Citation-based bootstrapping for large-scale author disambiguation[J]. Journal of the American Society for Information Science and Technology, 2012, 63(5): 1030-1047.
13 Zhang L, Lu W, Yang J Q. LAGOS-AND: a large gold standard dataset for scholarly author name disambiguation[J]. Journal of the Association for Information Science and Technology, 2023, 74(2): 168-185.
14 Kim J, Kim J, Owen-Smith J. Ethnicity-based name partitioning for author name disambiguation using supervised machine learning[J]. Journal of the Association for Information Science and Technology, 2021, 72(8): 979-994.
15 吴柯烨, 闵超, 孙建军, 等. 面向特定科研任务的著者姓名消歧方法[J]. 情报学报, 2021, 40(7): 734-744.
16 沈喆, 王毅, 鞠秀芳, 等. 基于先精确后召回策略的作者名消歧模型研究[J]. 情报学报, 2022, 41(4): 350-363.
17 周杰, 李弼程, 唐永旺. 基于关键证据与E2LSH的增量式人名聚类消歧方法[J]. 情报学报, 2016, 35(7): 714-722.
18 杨昭. 基于元路径的作者名称协同消歧研究[J]. 情报学报, 2023, 42(3): 327-340.
19 Liu J L, Lei K H, Liu J Y, et al. Ranking-based name matching for author disambiguation in bibliographic data[C]// Proceedings of the 2013 KDD Cup 2013 Workshop. New York: ACM Press, 2013: Article No.8.
20 Mozafari N. A genetic-based approach for author name disambiguation problem[J]. Iranian Journal of Information Processing and Management, 2021, 36(3): 791-816.
21 Tekles A, Bornmann L. Author name disambiguation of bibliometric data: a comparison of several unsupervised approaches[J]. Quantitative Science Studies, 2020, 1(4): 1510-1528.
22 Pooja K, Mondal S, Chandra J. Exploiting similarities across multiple dimensions for author name disambiguation[J]. Scientometrics, 2021, 126(9): 7525-7560.
23 De Bonis M, Falchi F, Manghi P. Graph-based methods for author name disambiguation: a survey[J]. PeerJ Computer Science, 2023, 9: e1536.
24 Mihaljevi? H, Santamaría L. Disambiguation of author entities in ADS using supervised learning and graph theory methods[J]. Scientometrics, 2021, 126(5): 3893-3917.
25 Xia L Q, Li C X, Zhang C B, et al. Leveraging error-assisted fine-tuning large language models for manufacturing excellence[J]. Robotics and Computer-Integrated Manufacturing, 2024, 88: 102728.
26 Kim J, Kim J. The impact of imbalanced training data on machine learning for author name disambiguation[J]. Scientometrics, 2018, 117(1): 511-526.
27 The USGenWeb Project. Research support[EB/OL]. [2025-05-14]. https://www.usgenweb.org/research/index.html.
28 Philips L. The double metaphone search algorithm[J]. C/C++ Users Journal, 2000, 18(6): 38-43.
29 Malvestio I, Cardillo A, Masuda N. Interplay between k-core and community structure in complex networks[J]. Scientific Reports, 2020, 10: Article No.14702.
30 Zhang Y F, Wang Z Y, He Z T, et al. BB-GeoGPT: a framework for learning a large language model for geographic information science[J]. Information Processing & Management, 2024, 61(5): 103808.
31 Rosvall M, Bergstrom C T. Maps of random walks on complex networks reveal community structure[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(4): 1118-1123.
32 Wang X Z, Tang J, Cheng H, et al. ADANA: active name disambiguation[C]// Proceedings of the 11th IEEE International Conference on Data Mining. Piscataway: IEEE, 2011: 794-803.
33 Momeni F, Mayr P. An open testbed for author name disambiguation evaluation[DS/OL]. [2025-05-14]. https://doi.org/10.7802/1234.
34 DBLP. dblp-2015-05-01[EB/OL]. [2025-05-14]. https://dblp.org/xml/release/dblp-2015-05-01.xml.gz.
35 Jin R R, Du J C, Huang W W, et al. A comprehensive evaluation of quantization strategies for large language models[C]// Findings of the Association for Computational Linguistics ACL 2024. Stroudsburg: Association for Computational Linguistics, 2024: 12186-12215.
36 Parthasarathy V B, Zafar A, Khan A, et al. The ultimate guide to fine-tuning LLMs from basics to breakthroughs: an exhaustive review of technologies, research, best practices, applied research challenges and opportunities[PP/OL]. V3. arXiv (2024-10-30). https://arxiv.org/pdf/2408.13296.
37 Kim J. A fast and integrative algorithm for clustering performance evaluation in author name disambiguation[J]. Scientometrics, 2019, 120(2): 661-681.
38 Subramanian S, King D, Downey D, et al. S2AND: a benchmark and evaluation system for author name disambiguation[C]// Proceedings of the 2021 ACM/IEEE Joint Conference on Digital Libraries. Piscataway: IEEE, 2021: 170-179.
39 Momeni F, Mayr P. Evaluating co-authorship networks in author name disambiguation for common names[C]// Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries. Cham: Springer, 2016: 386-391.
40 Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 4768-4777.
41 Qian Y N, Zheng Q H, Sakai T, et al. Dynamic author name disambiguation for growing digital libraries[J]. Information Retrieval Journal, 2015, 18(5): 379-412.
42 Liu W L, Do?an R I, Kim S, et al. Author name disambiguation for PubMed[J]. Journal of the Association for Information Science and Technology, 2014, 65(4): 765-781.
43 Urrutia F, Araya R. Who’s the best detective? Large language models vs. traditional machine learning in detecting incoherent fourth grade math answers[J]. Journal of Educational Computing Research, 2024, 61(8): 1723-1754.
44 Liu S J, Fang Y F. Use large language models for named entity disambiguation in academic knowledge graphs[C]// Proceedings of the 2023 3rd International Conference on Education, Information Management and Service Science. Paris: Atlantis Press, 2023: 681-691.
45 Rojo-Echeburúa A. Top 15 small language models for 2025[EB/OL]. (2024-11-14) [2025-05-14]. https://www.datacamp.com/blog/ top-small-language-models.
46 ChatGLM[EB/OL]. [2025-05-14]. https://baike.baidu.com/item/ChatGLM/62811883.
47 “磐石·科学基础大模型”正式发布赋能科研范式重塑[EB/OL]. (2025-07-26). https://ia.cas.cn/kxyj/kydt_1/202507/t20250726_7897083.html.
48 Yang A, Li A F, Yang B S, et al. Qwen3 technical report[PP/OL]. V1. arXiv (2025-05-14). https://arxiv.org/abs/2505.09388.
49 OpenAI. 隆重推出GPT-OSS[EB/OL]. (2025-08-05). https://openai.com/zh-Hans-CN/index/introducing-gpt-oss/.