大模型驱动的科技论文新颖性测度探索

doi:10.3772/j.issn.1000-0135.2025.09.003

情报学报

2025, Vol. 44

Issue (9): 1099-1113 DOI: 10.3772/j.issn.1000-0135.2025.09.003

情报理论与方法

本期目录 | 过刊浏览 | 高级检索

大模型驱动的科技论文新颖性测度探索

张琳^1,2,3, 李思佳^1,2, 施顺顺^1,2, 苟震宇^1,2, 黄颖^1,2,3

1.武汉大学信息管理学院，武汉 430072
2.武汉大学科教管理与评价中心，武汉 430072
3.鲁汶大学ECOOM研究中心，鲁汶 B- 3000

An Exploration of the Novelty Measurement Task of Scientific Literature Driven by a Large Language Model

Zhang Lin^1,2,3, Li Sijia^1,2, Shi Shunshun^1,2, Gou Zhenyu^1,2, Huang Ying^1,2,3

1.School of Information Management, Wuhan University, Wuhan 430072
2.Center for Science, Technology & Education Assessment (CSTEA), Wuhan University, Wuhan 430072
3.Centre for R&D Monitoring (ECOOM) and Department of MSI, KU Leuven, Leuven B- 3000

摘要
图/表
参考文献
相关文章 (11)

全文: PDF (2554 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要科技论文的新颖性测度是创新性评价的重要组成部分，为了分析和提高大模型在科技论文新颖性测度任务中的可用性和可解释性，本文从科技论文的研究问题、研究方法、研究结论等知识单元出发，探索性地提出一种大模型驱动的科技论文新颖性测度方法。本文设计了面向科技论文知识单元抽取任务的提示模板，使用有监督微调（supervised fine-tuning，SFT）和直接偏好优化（direct preference optimization，DPO）技术微调Qwen2-72B-Instruct开源大语言模型，抽取科技论文中的“问题”“方法”和“结论”知识单元；实现知识单元的语义嵌入，并引入平均聚合思想实现知识单元组合的语义嵌入，通过比较“新”论文与“旧”参照论文集间的语义嵌入向量来测度“新”论文的新颖性。研究结果表明，在科技论文知识单元抽取任务中，微调后的模型效果优于基线模型；相较于已有的论文新颖性计算方法，本文提出的基于知识单元的科技论文新颖性测度模型能从知识单元及其组合的语义层面捕获更为精细的新颖性差异。综合来看，大模型驱动的科技论文新颖性测度方法能够较好地完成科技论文新颖性测度任务，丰富论文新颖性测度方法。本文仅在计算机科学与技术学科中文论文摘要集上展开实验，对于其他领域的可用性有待进一步讨论，同时在使用大模型时仍需要人工辅助以提高结果的可解释性和可靠性。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张琳
	李思佳
	施顺顺
	苟震宇
	黄颖

关键词 ：大语言模型, 知识单元, 知识嵌入, 论文新颖性

收稿日期: 2024-12-02

基金资助:国家自然科学基金面上项目“从测度到理解：跨学科研究的成果分类、合作模式与影响扩散研究”（72374160）。

作者简介: 张琳，女，1980年生，博士，教授，博士生导师，主要研究方向为科学计量与科教政策评价，E-mail：linzhang1117@whu.edu.cn；李思佳，女，2000年生，硕士研究生，主要研究方向为数智赋能的创新评价；施顺顺，女，1993年生，博士研究生，主要研究方向为科学计量与科教政策评价；苟震宇，男，1998年生，博士研究生，主要研究方向为科学计量与替代计量；黄颖，男，1990年生，博士，副教授，博士生导师，主要研究方向为科技文献计量与科技管理；

引用本文:

张琳, 李思佳, 施顺顺, 苟震宇, 黄颖. 大模型驱动的科技论文新颖性测度探索[J]. 情报学报, 2025, 44(9): 1099-1113.
Zhang Lin, Li Sijia, Shi Shunshun, Gou Zhenyu, Huang Ying. An Exploration of the Novelty Measurement Task of Scientific Literature Driven by a Large Language Model. 情报学报, 2025, 44(9): 1099-1113.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2025.09.003 或 https://qbxb.istic.ac.cn/CN/Y2025/V44/I9/1099

1 陆伟, 刘寅鹏, 石湘, 等. 大模型驱动的学术文本挖掘——推理端指令策略构建及能力评测[J]. 情报学报, 2024, 43(8): 946-959.
2 梁福军. 英文科技论文规范写作与编辑[M]. 北京: 清华大学出版社, 2014.
3 Liang W X, Zhang Y H, Cao H C, et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis[OL]. (2023-10-03). https://arxiv.org/pdf/2310.01783.
4 王雅琪, 曹树金. ChatGPT用于论文创新性评价的效果及可行性分析[J]. 情报资料工作, 2023, 44(5): 28-38.
5 唐晓波, 朱婧, 杜鑫. 基于知识元语义组合差异的专利新颖性细粒度测度方法——以工业机器人领域为例[J]. 情报理论与实践, 2023, 46(11): 154-163, 195.
6 沈雪莹, 欧石燕. 科学文献知识单元抽取及应用研究: 梳理与展望[J]. 情报理论与实践, 2022, 45(12): 195-207.
7 陆伟, 王玉琦, 罗卓然, 等. 基于双层时序网络的学术论文创新度量研究[J]. 复杂科学管理, 2023(2): 15-32.
8 安欣, 徐硕, 叶书路, 等. 面向全文本的微观实体抽取及扩散研究[J]. 图书馆论坛, 2021, 41(3): 42-49.
9 章成志, 谢雨欣, 张恒. 学术文献全文内容中的方法实体细粒度抽取及演化分析研究[J]. 情报学报, 2023, 42(8): 952-966.
10 章成志, 谢雨欣, 宋云天. 学术文本中细粒度知识实体的关联分析[J]. 图书馆论坛, 2021, 41(3): 12-20.
11 李贺, 杜杏叶. 基于知识元的学术论文内容创新性智能化评价研究[J]. 图书情报工作, 2020, 64(1): 93-104.
12 Wang Z Y, Shen X Y, Huang R, et al. Extracting method knowledge elements from scientific literature: a rule-based approach[J]. Proceedings of the Association for Information Science and Technology, 2019, 56(1): 805-807.
13 曹树金, 曹茹烨. 情报学论文创新性评价研究——LDA和SVM融合方法的应用[J]. 图书情报知识, 2022, 39(4): 56-67.
14 Duck G, Kovacevic A, Robertson D L, et al. Ambiguity and variability of database and software names in bioinformatics[J]. Journal of Biomedical Semantics, 2015, 6: 29.
15 Lin L, Wang D, Shen S. Extraction of thesis research conclusion sentences in academic literature[C]// Proceedings of the 2nd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents. Aachen: CEUR-WS.org, 2021: 74-76.
16 Mesbah S, Lofi C, Torre M V, et al. TSE-NER: an iterative approach for long-tail entity extraction in scientific publications[C]// Proceedings of the 17th International Semantic Web Conference. Cham: Springer, 2018: 127-143.
17 陆伟, 刘家伟, 马永强, 等. ChatGPT为代表的大模型对信息资源管理的影响[J]. 图书情报知识, 2023, 40(2): 6-9, 70.
18 车万翔, 窦志成, 冯岩松, 等. 大模型时代的自然语言处理: 挑战、机遇与发展[J]. 中国科学: 信息科学, 2023, 53(9): 1645-1687.
19 Bornmann L, Wu L F, Ettl C. The use of ChatGPT for identifying disruptive papers in science: a first exploration[J]. Scientometrics, 2024, 129(11): 7161-7165.
20 Nishikawa K, Koshiba H. Exploring the applicability of large language models to citation context analysis[J]. Scientometrics, 2024, 129(11): 6751-6777.
21 Cui W T, Xiao M, Wang L D, et al. Automated taxonomy alignment via large language models: bridging the gap between knowledge domains[J]. Scientometrics, 2024, 129(9): 5287-5312.
22 洪贇, 叶鹰, 佟彤. 国内外大语言模型的图书情报应用探讨[J]. 图书馆理论与实践, 2024(2): 72-80.
23 陈建青. 对我国学术论文创新性评审的几点思考[J]. 青年记者, 2013(18): 33-35.
24 侯剑华, 王东毅. 基于SAO-ADV模型的学术论文创新性的测度方法研究[J]. 情报理论与实践, 2020, 43(11): 129-136.
25 Kaufer D S, Geisler C. Novelty in academic writing[J]. Written Communication, 1989, 6(3): 286-311.
26 周露阳. 论审评学术论文创新因素的指标体系[J]. 编辑学报, 2006, 18(1): 68-70.
27 Lee Y N, Walsh J P, Wang J. Creativity in scientific teams: Unpacking novelty and impact[J]. Research Policy, 2015, 44(3): 684-697.
28 李晶, 杨雪, 苏秋丹, 等. 基于知识单元理论的科技成果创新性测度研究述评[J]. 现代情报, 2023, 43(8): 161-177.
29 黄迪汉. 浅谈科技论文的新颖性和科学性[M]// 科技期刊编辑研究文集(第三集). 成都: 四川科学技术出版社, 1994: 103-105.
30 魏绪秋, 申力旭. 学术论文创新性研究述评[J]. 图书情报知识, 2022, 39(4): 68-79.
31 Mishra S, Torvik V I. Quantifying conceptual novelty in the biomedical literature[J]. D-Lib Magazine, 2016, 22(9/10). DOI: 10.1045/september2016-mishra.
32 Arthur W B. The nature of technology: what it is and how it evolves[M]. New York: Simon and Schuster, 2009.
33 Boudreau K J, Guinan E C, Lakhani K R, et al. Looking across and looking beyond the knowledge frontier: intellectual distance, novelty, and resource allocation in science[J]. Management Science, 2016, 62(10): 2765-2783.
34 Uzzi B, Mukherjee S, Stringer M, et al. Atypical combinations and scientific impact[J]. Science, 2013, 342(6157): 468-472.
35 Matsumoto K, Shibayama S, Kang B, et al. Introducing a novelty indicator for scientific research: validating the knowledge-based combinatorial approach[J]. Scientometrics, 2021, 126(8): 6891-6915.
36 Wang J, Veugelers R, Stephan P. Bias against novelty in science: a cautionary tale for users of bibliometric indicators[J]. Research Policy, 2017, 46(8): 1416-1436.
37 Chen C H, Mayanglambam S D, Hsu F Y, et al. Novelty paper recommendation using citation authority diffusion[C]// Proceedings of the 16th International Conference on Technologies and Applications of Artificial Intelligence. Piscataway: IEEE, 2011: 126-131.
38 Tahamtan I, Bornmann L. Creativity in science and the link to cited references: is the creative potential of papers reflected in their cited references?[J]. Journal of Informetrics, 2018, 12(3): 906-930.
39 Tahamtan I, Bornmann L. Core elements in the process of citing publications: conceptual overview of the literature[J]. Journal of Informetrics, 2018, 12(1): 203-216.
40 朱大明. 参考文献的主要作用与学术论文的创新性评审[J]. 编辑学报, 2004, 16(2): 91-92.
41 索传军, 赖海媚. 学术论文问题知识元的类型与描述规则[J]. 中国图书馆学报, 2021, 47(2): 95-109.
42 李姗, 单磊, 崔雷. 不同被引频次论文主题词组合特征及其与论文新颖性关系的研究——以免疫学ESI指标为例[J]. 情报理论与实践, 2021, 44(1): 162-167.
43 Jeon D, Lee J, Ahn J M, et al. Measuring the novelty of scientific publications: a fastText and local outlier factor approach[J]. Journal of Informetrics, 2023, 17(4): 101450.
44 逯万辉, 谭宗颖. 学术成果主题新颖性测度方法研究——基于Doc2Vec和HMM算法[J]. 数据分析与知识发现, 2018, 2(3): 22-29.
45 杨建林, 钱玲飞. 基于关键词对逆文档频率的主题新颖度度量方法[J]. 情报理论与实践, 2013, 36(3): 99-102.
46 Amplayo R K, Hong S L, Song M. Network-based approach to detect novelty of scholarly literature[J]. Information Sciences, 2018, 422: 542-557.
47 Luo Z R, Lu W, He J G, et al. Combination of research questions and methods: a new measurement of scientific novelty[J]. Journal of Informetrics, 2022, 16(2): 101282.
48 罗卓然, 陆伟, 蔡乐, 等. 学术文本词汇功能识别——在论文新颖性度量上的应用[J]. 情报学报, 2022, 41(7): 720-732.
49 钱佳佳, 罗卓然, 陆伟. 基于问题-方法组合的科技论文新颖性度量与创新类型识别[J]. 图书情报工作, 2021, 65(14): 82-89.
50 戎军涛, 索传军, 周彦廷, 等. 基于创新知识元谱系的学术论文新颖性测度研究[J]. 图书情报工作, 2024, 68(1): 27-38.
51 张颖怡, 章成志, 周毅, 等. 基于ChatGPT的多视角学术论文实体识别: 性能测评与可用性研究[J]. 数据分析与知识发现, 2023, 7(9): 12-24.
52 时宗彬, 朱丽雅, 乐小虬. 基于本地大语言模型和提示工程的材料信息抽取方法研究[J]. 数据分析与知识发现, 2024, 8(7): 23-31.
53 黄俊涛. 科技领域知识图谱构建技术研究[D]. 北京: 北方工业大学, 2024.
54 王喆. 深度学习推荐系统[M]. 北京: 电子工业出版社, 2020.
55 汪雪锋, 于慧妍, 郑思佳, 等. 学术论文创新质量评价研究——以多能干细胞技术为例[J]. 数据分析与知识发现, 2024, 8(5): 127-138.
56 詹媛. 我国科技期刊学术影响力逐年上升[N]. 光明日报, 2024-12-20(8).
57 Li Y D, Zhang Y Q, Zhao Z, et al. CSL: a large-scale Chinese scientific literature dataset[C]// Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 3917-2923.
58 Yang A, Yang B S, Hui B Y, et al. Qwen2 technical report[OL]. (2024-09-10). https://arxiv.org/pdf/2407.10671.
59 Bai J Z, Bai S, Chu Y F, et al. Qwen technical report[OL]. (2023-09-28). https://arxiv.org/pdf/2309.16609.
60 张吉玉, 张均胜. 考虑时序的单篇科技文献新颖性评估方法[J]. 图书情报工作, 2022, 66(17): 93-105.
61 逯万辉, 苏金燕, 余倩. 学术成果主题新颖性与学术引用的相关关系研究[J]. 情报资料工作, 2018, 39(6): 68-73.