|
|
Advances in Identification, Organization, and Application of Innovative Content in Scientific Literature |
Xu Lei1, Zhang Yafei2, Ye Junling2 |
1.Semantic Publishing and Knowledge Service Laboratory, Wuhan University, Wuhan 430072 2.Cultural Heritage Intelligent Computing Laboratory, Wuhan University, Wuhan 430072 |
|
|
Abstract As a core object of scholarly communication, content plays an important role in scientific innovation activities. Through document analysis, this paper clarifies the connotation of related concepts in scientific innovation, summarizes the main practices and problems of scientific innovation content identification and extraction from scientific literature, and compares the main data models of structured organization and application scenarios in innovation content. Significant disciplinary differences were found in identification and extraction, structured organization and application based on scientific innovation and granularity of the innovation content organization in related data models. Data models and applications for innovative content have broad prospects for future development; however, the large-scale practices in this field are few. Finally, a scientific communication framework was designed based on scientific innovation content and two macro level practice paths, as well as three main tasks and challenges to realize this framework.
|
Received: 28 November 2022
|
|
|
|
1 伯纳德·巴伯. 科学与社会秩序[M]. 顾昕, 郏斌祥, 赵雷进, 译. 北京: 生活·读书·新知三联书店, 1991. 2 李贺, 杜杏叶. 基于知识元的学术论文内容创新性智能化评价研究[J]. 图书情报工作, 2020, 64(1): 93-104. 3 OECD, Eurostat. Oslo Manual 2018: guidelines for collecting, reporting and using data on innovation[M]. 4th Edition. Paris: OECD Publishing, 2018. 4 周露阳. 论审评学术论文创新因素的指标体系[J]. 编辑学报, 2006, 18(1): 68-70. 5 曹树金, 闫欣阳, 张倩, 等. 中外情报学论文创新性特征研究[J]. 图书情报工作, 2020, 64(1): 80-92. 6 温浩, 乔晓东. 文摘创新点的语义本体模型研究[J]. 情报学报, 2017, 36(9): 964-971. 7 索传军, 于果鑫. 学术论文研究亮点的语言学特征与分布规律研究[J]. 图书情报工作, 2020, 64(9): 104-113. 8 Kuhn T S. The structure of scientific revolutions[M]. Chicago: University of Chicago Press, 1970. 9 Toulmin S. Rationality and scientific discovery[J]. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, 1972, 1972: 387-406. 10 罗卓然, 蔡乐, 钱佳佳, 等. 学术论文创新贡献句识别研究[J]. 图书情报工作, 2021, 65(12): 93-100. 11 周海晨, 郑德俊, 郦天宇. 学术全文本的学术创新贡献识别探索[J]. 情报学报, 2020, 39(8): 845-851. 12 索传军, 于果鑫. 学术论文研究亮点的语言学特征与分布规律研究[J]. 图书情报工作, 2020, 64(9): 104-113. 13 Hunston S. Professional conflict—disagreement in academic discourse[M]// Text and Technology. Amsterdam: John Benjamins Publishing Company, 1993: 115-134. 14 Golden P, Shaw R. Period assertion as nanopublication: the PeriodO period gazetteer[C]// Proceedings of the 24th International Conference on World Wide Web. New York: ACM Press, 2015: 1013-1018. 15 李瑛, 周立. 科技期刊论文创新点合理呈现的价值及理想模式[J]. 中国科技期刊研究, 2018, 29(10): 993-999. 16 温有奎, 吴广印. 碎片化科研创新点动态挖掘研究[J]. 数字图书馆论坛, 2014(7): 25-32. 17 温有奎, 温浩. 关键词与创新点词句群分布分析[J]. 情报学报, 2007, 26(1): 50-55. 18 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21. 19 冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119. 20 Dahl T. The linguistic representation of rhetorical function: a study of how economists present their knowledge claims[J]. Written Communication, 2009, 26(4): 370-391. 21 Dahl T. Contributing to the academic conversation: a study of new knowledge claims in economics and linguistics[J]. Journal of Pragmatics, 2008, 40(7): 1184-1201. 22 温浩, 何茜茹. 学术文摘创新点挖掘的认知分析方法[J]. 情报学报, 2021, 40(5): 489-499. 23 黄文彬, 王越千, 步一, 等. 学术论文子句语义类型自动标注技术研究[J]. 情报学报, 2021, 40(6): 621-629. 24 Kempf S, Krug M, Puppe F. KIETA: key-insight extraction from scientific tables[J]. Applied Intelligence, 2023, 53(8): 9513-9530. 25 Heffernan K, Teufel S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2): 1367-1382. 26 白如江, 祝娜, 王效岳. 语义增强的科技创新内容表征研究[J]. 情报理论与实践, 2016, 39(3): 73-79. 27 欧石燕, 陈嘉文. 科学论文全文语步自动识别研究[J]. 现代情报, 2021, 41(11): 3-11. 28 Shen S, Jiang C, Hu H T, et al. A model for the identification of the functional structures of unstructured abstracts in the social sciences[J]. The Electronic Library, 2022, 40(6): 680-697. 29 Cagliero L, La Quatra M. Extracting highlights of scientific articles: a supervised summarization approach[J]. Expert Systems with Applications, 2020, 160: 113659. 30 Singh N, Singh P, Bhagat D. A rule extraction approach from support vector machines for diagnosing hypertension among diabetics[J]. Expert Systems with Applications, 2019, 130: 188-205. 31 Lee W J, Choi J. Connecting distant entities with induction through conditional random fields for named entity recognition: precursor-induced CRF[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 9-13. 32 Huang S, Cole J M. A database of battery materials auto-generated using ChemDataExtractor[J]. Scientific Data, 2020, 7(1): Article No.260. 33 Olivetti E A, Cole J M, Kim E, et al. Data-driven materials research enabled by natural language processing and information extraction[J]. Applied Physics Reviews, 2020, 7(4): 041317. 34 温浩. 科技文摘创新点语义识别与分类方法研究[J]. 情报学报, 2019, 38(3): 249-256. 35 Grosz B J, Sidner C L. Attention, intentions, and the structure of discourse[J]. Computational Linguistics, 1986, 12(3): 175-204. 36 de Waard A, Pander Maat H. Verb form indicates discourse segment type in biological research papers: experimental evidence[J]. Journal of English for Academic Purposes, 2012, 11(4): 357-366. 37 Burns G A P C, Dasigi P, de Waard A, et al. Automated detection of discourse segment and experimental types from the text of cancer pathway results sections[J]. Database, 2016, 2016: baw122. 38 曹树金, 赵浜, 岳文玉, 等. 学术论文创新点的识别与检索入口研究——以情报学期刊论文为例[J]. 现代情报, 2021, 41(12): 17-27. 39 Collins E, Augenstein I, Riedel S. A supervised approach to extractive summarisation of scientific papers[C]// Proceedings of the 21st Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2017: 195-205. 40 La Quatra M, Cagliero L. Transformer-based highlights extraction from scientific papers[J]. Knowledge-Based Systems, 2022, 252: 109382. 41 曹树金, 曹茹烨. 基于知识图谱的科技论文创新点动态识别研究[J]. 现代情报, 2022, 42(12): 28-41, 82. 42 曹树金, 闫颂. 基于语义角色信息的科技论文创新段落定位及功能句识别方法研究——以中文情报学领域论文为例[J]. 情报理论与实践, 2022, 45(11): 1-9, 20. 43 柴庆凤, 史霖炎, 梅珊, 等. 基于人工特征和机器特征融合的科技文献知识元抽取[J]. 数据分析与知识发现, 2021, 5(8): 132-143. 44 Farnsworth S, Gurdin G, Vargas J, et al. Extracting experimental parameter entities from scientific articles[J]. Journal of Biomedical Informatics, 2022, 126: 103970. 45 徐健, 郭语凡, 喻雪寒, 等. 学术论断句标注与识别方法探索[J]. 情报学报, 2022, 41(7): 707-719. 46 Vogt L, D’Souza J, Stocker M, et al. Toward representing research contributions in scholarly knowledge graphs using knowledge graph cells[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. New York: ACM Press, 2020: 107-116. 47 Magnusson I H, Friedman S E. Extracting fine-grained knowledge graphs of scientific claims: dataset and transformer-based results[OL]. (2021-09-21). https://arxiv.org/pdf/2109.10453.pdf. 48 Oelen A, Jaradeh M Y, Farfar K E, et al. Comparing research contributions in a scholarly knowledge graph[C]// Proceedings of the Third International Workshop on Capturing Scientific Knowledge and the 10th International Conference on Knowledge Capture. Aachen: RWTH Aachen, 2019: 21-26. 49 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985. 50 Groth P, Gibson A, Velterop J. The anatomy of a nanopublication[J]. Information Services and Use, 2010, 30(1/2): 51-56. 51 Toulmin S E. The uses of argument[M]. Cambridge: Cambridge University Press, 2003. 52 王晓光, 周慧敏, 宋宁远. 科学论文论证本体设计与标注实验[J]. 情报学报, 2020, 39(9): 885-895. 53 Ciccarese P, Wu E, Wong G, et al. The SWAN biomedical discourse ontology[J]. Journal of Biomedical Informatics, 2008, 41(5): 739-751. 54 Clark T, Ciccarese P N, Goble C A. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications[J]. Journal of Biomedical Semantics, 2014, 5: 28. 55 薛家秀, 欧石燕. 科学论文篇章结构建模与解析研究进展[J]. 图书与情报, 2019(2): 120-132. 56 Soldatova L, Liakata M. An ontology methodology and CISP-the proposed core information about scientific papers[R]. Wales: JISC Project Report, 2007. 57 Liakata M, Saha S, Dobnik S, et al. Automatic recognition of conceptualization zones in scientific articles and two life science applications[J]. Bioinformatics, 2012, 28(7): 991-1000. 58 Peroni S, Shotton D. The SPAR ontologies[C]// Proceedings of the 17th International Semantic Web Conference. Cham: Springer, 2018: 119-136. 59 Constantin A, Peroni S, Pettifer S, et al. The document components ontology (DoCO)[J]. Semantic Web, 2016, 7(2): 167-181. 60 王丽丽, 于淼. 结论型知识元语义描述模型探析[J]. 图书情报导刊, 2020, 5(10): 40-45. 61 Li X Y, Peng S Y, Du J. Towards medical knowmetrics: representing and computing medical knowledge using semantic predications as the knowledge unit and the uncertainty as the knowledge context[J]. Scientometrics, 2021, 126(7): 6225-6251. 62 Bucur C I, Kuhn T, Ceolin D, et al. Expressing high-level scientific claims with formal semantics[C]// Proceedings of the 11th Knowledge Capture Conference. New York: ACM Press, 2021: 233-240. 63 D’Souza J, Auer S. NLPContributions: an annotation scheme for machine reading of scholarly contributions in natural language processing literature[OL]. (2020-09-03). http://arxiv.org/pdf/2006.12870.pdf. 64 Chen H H, Nguyen H, Alghamdi A. Constructing a high-quality dataset for automated creation of summaries of fundamental contributions of research articles[J]. Scientometrics, 2022, 127(12): 7061-7075. 65 Chandak S, Zhang L Q, Brown C, et al. Towards automatic curation of antibiotic resistance genes via statement extraction from scientific papers: a benchmark dataset and models[C]// Proceedings of the 21st Workshop on Biomedical Language Processing. Stroudsburg: Association for Computational Linguistics, 2022: 402-411. 66 IzaFajri. 面向语义出版的学术期刊信息资源聚合研究[D]. 武汉: 华中师范大学, 2018. 67 Ernst P, Meng C, Siu A, et al. KnowLife: a knowledge graph for health and life sciences[C]// Proceedings of the 2014 IEEE 30th International Conference on Data Engineering. Piscataway: IEEE, 2014: 1254-1257. 68 Wise C, Ioannidis V N, Calvo M R, et al. COVID-19 knowledge graph: accelerating information retrieval and discovery for scientific literature[OL]. (2020-07-24). http://arxiv.org/pdf/2007.12731.pdf. 69 Yu T, Li J H, Yu Q, et al. Knowledge graph for TCM health preservation: design, construction, and applications[J]. Artificial Intelligence in Medicine, 2017, 77: 48-52. 70 Dessí D, Osborne F, Reforgiato Recupero D, et al. CS-KG: a large-scale knowledge graph of research entities and claims in computer science[C]// Proceedings of International Semantic Web Conference. Cham: Springer, 2022: 678-696. 71 Jaradeh M Y, Oelen A, Prinz M, et al. Open research knowledge graph: a system walkthrough[C]// Proceedings of International Conference on Theory and Practice of Digital Libraries. Cham: Springer, 2019: 348-351. 72 Kuhn T, Mero?o-Pe?uela A, Malic A, et al. Nanopublications: a growing resource of provenance-centric scientific linked data[C]// Proceedings of the 2018 IEEE 14th International Conference on e-Science. Piscataway: IEEE, 2018: 83-92. 73 Buscaldi D, Dessì D, Motta E, et al. Mining scholarly data for fine-grained knowledge graph construction[C]// Proceedings of the Workshop on Deep Learning for Knowledge Graphs. Cham: Springer, 2019: 21-30. 74 尤众喜, 华薇娜. 图形化摘要对学术交流的推动作用辨析[J]. 现代情报, 2017, 37(11): 22-27. 75 West C C, Lindsay K J, Hart A. Promoting your research using infographics and visual abstracts[J]. Journal of Plastic, Reconstructive & Aesthetic Surgery, 2020, 73(12): 2103-2105. 76 Arab Oghli O. Information retrieval service aspects of the open research knowledge graph[D]. Hannover: Gottfried Wilhelm Leibniz Universit?t, 2022. 77 Williams A J, Harland L, Groth P, et al. Open PHACTS: semantic interoperability for drug discovery[J]. Drug Discovery Today, 2012, 17(21/22): 1188-1198. 78 Rindflesch T C, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text[J]. Journal of Biomedical Informatics, 2003, 36(6): 462-477. 79 Kilicoglu H, Shin D, Fiszman M, et al. SemMedDB: a PubMed-scale repository of biomedical semantic predications[J]. Bioinformatics, 2012, 28(23): 3158-3160. 80 Henry S, McInnes B T. Literature based discovery: models, methods, and trends[J]. Journal of Biomedical Informatics, 2017, 74: 20-32. 81 曹树金, 赵浜. 面向学术论文创新内容的知识图谱构建与应用[J]. 现代情报, 2021, 41(12): 28-37. 82 郭进京, 黄奇. 不确定性环境下的矛盾知识主张识别研究[J]. 图书情报工作, 2021, 65(20): 123-134. 83 Hou J H, Wang D Y, Li J. A new method for measuring the originality of academic articles based on knowledge units in semantic networks[J]. Journal of Informetrics, 2022, 16(3): 101306. 84 杨京, 王芳, 白如江. 基于研究水平的单篇学术论文创新力评价研究——以碳纳米管材料领域为例[J]. 情报理论与实践, 2017, 40(9): 105-111, 76. 85 Bless C, Baimuratov I, Karras O. SciKGTeX-A LATEX package to semantically annotate contributions in scientific publications[C]// Proceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries. Piscataway: IEEE, 2023: 155-164. 86 Peroni S, Osborne F, Di Iorio A, et al. Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles[J]. PeerJ Computer Science, 2017, 3: e132. 87 Peroni S. Automating semantic publishing[J]. Data Science, 2017, 1(1/2): 155-173. 88 Kuhn T, Dumontier M. Genuine semantic publishing[J]. Data Science, 2017, 1(1/2): 139-154. 89 Bucur C I, Kuhn T, Ceolin D, et al. Nanopublication-based semantic publishing and reviewing: a field study with formalization papers[J]. PeerJ Computer Science, 2023, 9: e1159. |
|
|
|