摘要为进一步挖掘学术论文新颖性的丰富内涵,本文以组合创新理论为基础,开展了基于词汇功能的学术论文新颖性度量研究。以ACM(Association for Computing Machinery)Digital Library收录的论文为数据,提出了面向CS(computer science)领域进一步预训练的词汇新颖性计算方法和基于语义相似度的问题-方法组合新颖度计算流程,分别计算了问题词、方法词、问题-方法组合和论文的语义新颖性,并将本文语义新颖性计算方法与已有的词频共现新颖性计算方法进行了对比。研究结果表明,ACM Digital Library收集的论文在研究方法和研究问题上创新度均较高,相较于已有的论文新颖性计算方法,本文提出的方法能从语义层面捕获更为精细的新颖性差异。
罗卓然, 陆伟, 蔡乐, 程齐凯. 学术文本词汇功能识别[J]. 情报学报, 2022, 41(7): 720-732.
Luo Zhuoran, Lu Wei, Cai Le, Cheng Qikai. Application of Lexical Functions in Novelty Measurement of Academic Papers. 情报学报, 2022, 41(7): 720-732.
1 K. R. 波珀. 科学发现的逻辑[M]. 查汝强, 邱仁宗, 译. 北京: 科学出版社, 1986. 2 Heffernan K, Teufel S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2): 1367-1382. 3 索传军, 赖海媚. 学术论文问题知识元的类型与描述规则[J]. 中国图书馆学报, 2021, 47(2): 95-109. 4 约瑟夫·熊彼特. 经济发展理论[M]. 郭武军, 吕阳, 译. 北京: 华夏出版社, 2015. 5 Kogut B, Zander U. Knowledge of the firm, combinative capabilities, and the replication of technology[J]. Organization Science, 1992, 3(3): 383-397. 6 Arthur W B. The structure of invention[J]. Research Policy, 2007, 36(2): 274-287. 7 逯万辉, 苏金燕, 余倩. 学术成果主题新颖性与学术引用的相关关系研究[J]. 情报资料工作, 2018(6): 68-73. 8 Uzzi B, Mukherjee S, Stringer M, et al. Atypical combinations and scientific impact[J]. Science, 2013, 342(6157): 468-472. 9 Wang J, Veugelers R, Stephan P. Bias against novelty in science: a cautionary tale for users of bibliometric indicators[J]. Research Policy, 2017, 46(8): 1416-1436. 10 Tahamtan I, Bornmann L. Creativity in science and the link to cited references: is the creative potential of papers reflected in their cited references?[J]. Journal of Informetrics, 2018, 12(3): 906-930. 11 Fortunato S, Bergstrom C T, B?rner K, et al. Science of science[J]. Science, 2018, 359(6379): eaao0185. 12 Azoulay P, Graff Zivin J S, Manso G. Incentives and creativity: evidence from the academic life sciences[J]. The RAND Journal of Economics, 2011, 42(3): 527-554. 13 Lee F. Recombinant uncertainty in technological search[J]. Management Science, 2001, 47(1): 117-132. 14 Mukherjee S, Uzzi B, Jones B, et al. A new method for identifying recombinations of existing knowledge associated with high-impact innovation[J]. Journal of Product Innovation Management, 2016, 33(2): 224-236. 15 Boyack K W, Klavans R. Atypical combinations are confounded by disciplinary effects[C]// Proceedings of the 19th International Conference on Science and Technology Indicators, Leiden, The Netherlands, 2014. 16 Hofstra B, Kulkarni V V, Galvez S M N, et al. The diversity-innovation paradox in science[J]. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117(17): 9284-9291. 17 任海英, 王德营, 王菲菲. 主题词组合新颖性与论文学术影响力的关系研究[J]. 图书情报工作, 2017, 61(9): 87-93. 18 王艳艳, 张均胜, 乔晓东, 等. 基于问题—方法矩阵的文献新颖性评估方法[J]. 情报理论与实践, 2021, 44(2): 90-95. 19 钱佳佳, 罗卓然, 陆伟. 基于问题-方法组合的科技论文新颖性度量与创新类型识别[J]. 图书情报工作, 2021, 65(14): 82-89. 20 徐庶睿, 卢超, 章成志. 术语引用视角下的学科交叉测度——以PLOS ONE上六个学科为例[J]. 情报学报, 2017, 36(8): 809-820. 21 陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014, 40(6): 93-104. 22 程齐凯. 学术文本的词汇功能识别[D]. 武汉: 武汉大学, 2015. 23 程齐凯, 李信, 陆伟. 领域无关学术文献词汇功能标准化数据集构建及分析[J]. 情报科学, 2019, 37(7): 41-47. 24 Jarvelin K, Vakkari P. Content analysis of research articles in library and information science[J]. Library and Information Science Research, 1990, 12(4): 395-421. 25 王芳, 史海燕, 纪雪梅. 我国情报学研究中理论的应用: 基于《情报学报》的内容分析[J]. 情报学报, 2015, 34(6): 581-591. 26 王芳, 王向女. 我国情报学研究方法的计量分析: 以1999~2008年《情报学报》为例[J]. 情报学报, 2010(4): 652-662. 27 Ferran-Ferrer N, Guallar J, Abadal E, et al. Research methods and techniques in Spanish library and information science journals (2012-2014)[J]. Information Research, 2017, 22(1): paper 741. 28 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6): 68-75. 29 Kondo T, Nanba H, Takezawa T, et al. Technical trend analysis by analyzing research papers’ titles[C]// Proceedings of the 4th Conference on Human Language Technology: Challenges for Computer Science and Linguistics. Heidelberg: Springer, 2011: 512-521. 30 Gupta S, Manning C D. Analyzing the dynamics of research by extracting key aspects of scientific papers[C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 2011: 1-9. 31 Tsai C T, Kundu G, Roth D. Concept-based analysis of scientific literature[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York: ACM Press, 2013: 1733-1738. 32 Tuomaala O, J?rvelin K, Vakkari P. Evolution of library and information science, 1965-2005: content analysis of journal articles[J]. Journal of the Association for Information Science and Technology, 2014, 65(7): 1446-1462. 33 赵洪, 王芳. 理论术语抽取的深度学习模型及自训练算法研究[J]. 情报学报, 2018, 37(9): 923-938. 34 王昊, 邓三鸿, 苏新宁, 等. 基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报, 2020, 39(8): 817-828. 35 李贺, 杜杏叶. 基于知识元的学术论文内容创新性智能化评价研究[J]. 图书情报工作, 2020, 64(1): 93-104. 36 章成志, 张颖怡. 基于学术论文全文的研究方法实体自动识别研究[J]. 情报学报, 2020, 39(6): 589-600. 37 程齐凯, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取[J]. 情报学报, 2021, 40(1): 43-52. 38 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329. 39 Kaplan S, Vakili K. The double-edged sword of recombination in breakthrough innovation[J]. Strategic Management Journal, 2015, 36(10): 1435-1457. 40 Yan Y, Tian S W, Zhang J J. The impact of a paper’s new combinations and new components on its citation[J]. Scientometrics, 2020, 122(2): 895-913. 41 Ponomarev I V, Williams D E, Hackett C J, et al. Predicting highly cited papers: a method for early detection of candidate breakthroughs[J]. Technological Forecasting and Social Change, 2014, 81: 49-55. 42 Luo Z R, Lu W, He J G, et al. Combination of research questions and methods: a new measurement of scientific novelty[J]. Journal of Informetrics, 2022, 16(2): 101282. 43 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. (2013-09-07). https://arxiv.org/pdf/1301.3781.pdf. 44 Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543. 45 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 46 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM Press, 2017: 6000-6010. 47 Su J L. SimBERT: integrating retrieval and generation into BERT[EB/OL]. (2020-07-28). https://github.com/ZhuiyiTechnology/simbert. 48 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3615-3620. 49 Gururangan S, Marasovi? A, Swayamdipta S, et al. Don’t stop pretraining: adapt language models to domains and tasks[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 8342-8360. 50 Su J L, Cao J R, Liu W J, et al. Whitening sentence representations for better semantics and faster retrieval[OL]. (2021-03-29). https://arxiv.org/pdf/2103.15316.pdf. 51 刘少鹏, 印鉴, 欧阳佳, 等. 基于MB-HDP模型的微博主题挖掘[J]. 计算机学报, 2015, 38(7): 1408-1419. 52 Kuhn T S. The structure of scientific revolutions[M]. 4th ed. Chicago: The University of Chicago Press, 2012.