|
|
Data Intelligence Empowered Innovation: An Exploration of the Innovation Assistance Framework Based on Data Intelligence Technology |
Lu Wei1,2, Ma Yongqiang1,2, Liu Jiawei1,2, Yang Jinqing3, Cheng Qikai1,2 |
1.School of Information Management, Wuhan University, Wuhan 430072 2.Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072 3.School of Information Management, Central China Normal University, Wuhan 430079 |
|
|
Abstract Large language models (LLMs), such as ChatGPT (Chat Generative Pre-trained Transformer), have shown excellent performance in text generation and human-machine dialog. Under the background of LLMs, such technologies as big data and artificial intelligence have demonstrated important practical value in empowering scientific research and innovation. Although current science and technology (S&T) information resource management and knowledge services can provide relatively accurate information and routine knowledge aggregation services for scientific research and innovation, they have not yet been deeply integrated with scientific research and innovation activities. Researchers also face challenges such as insufficient information processing capabilities and limited cognitive abilities during the scientific research and innovation process. This article analyzes the new characteristics of scientific research activities in the data intelligence era. Furthermore, an innovation assistance framework based on data intelligence technologies is proposed, and its functional positioning, service mode, and key empowerment path in the entire innovation process are analyzed and discussed in depth. In the future, with the continuous maturity and progress of big data and artificial intelligence technologies, data-intelligence-technology-enabled S&T information resource management will be further embedded in the entire process of scientific research and innovation activities. Innovative assistance services based on data intelligence technologies, such as innovative auxiliary services for literature reading, experiment design, and article-writing scenarios, can provide researchers with personalized, fine-grained knowledge and scenario-based solutions to better serve scientific research and innovation activities.
|
Received: 05 April 2023
|
|
|
|
1 OpenAI. ChatGPT: optimizing language models for dialogue[EB/OL]. (2022-11-30) [2023-02-09]. https://openai.com/blog/chatgpt. 2 Hutson M. Could AI help you to write your next paper?[J]. Nature, 2022, 611(7934): 192-193. 3 Stokel-Walker C, van Noorden R. What ChatGPT and generative AI mean for science[J]. Nature, 2023, 614(7947): 214-216. 4 陆伟, 杨金庆. 数智赋能的情报学学科发展趋势探析[J]. 信息资源管理学报, 2022, 12(2): 4-12. 5 赵志耘. 论复杂信息环境下的科技情报卓智赋能[J]. 情报学报, 2022, 41(12): 1229-1237. 6 许勇, 黄福寿. 人工智能哲学研究述评[J]. 上海交通大学学报(哲学社会科学版), 2020, 28(1): 116-123. 7 新华社. 科技部启动“人工智能驱动的科学研究”专项部署工作[EB/OL]. (2023-03-27) [2023-03-28]. http://www.gov.cn/xinwen/2023-03/27/content_5748495.htm. 8 马费成. 守正创新, 继续推进信息资源管理学科的发展[J]. 情报资料工作, 2023, 44(1): 13-14. 9 张智雄, 于改红, 刘熠, 等. ChatGPT对文献情报工作的影响[J]. 数据分析与知识发现, 2023, 7(3): 36-42. 10 罗卓然, 陆伟, 蔡乐, 等. 学术文本词汇功能识别——在论文新颖性度量上的应用[J]. 情报学报, 2022, 41(7): 720-732. 11 胡志刚, 章成志. 悄然兴起的全文计量分析[J]. 图书馆论坛, 2021, 41(3): 1-11. 12 Cachola I, Lo K, Cohan A, et al. TLDR: extreme summarization of scientific documents[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 4766-4777. 13 程齐凯, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取[J]. 情报学报, 2021, 40(1): 43-52. 14 Cohan A, Ammar W, van Zuylen M, et al. Structural scaffolds for citation intent classification in scientific publications[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 3586-3596. 15 陈博立, 鲜国建, 赵瑞雪, 等. 科技文献问答式智能检索总体设计与关键技术探析[J]. 中国图书馆学报, 2023, 49(3): 92-106. 16 孙坦, 刘峥, 崔运鹏, 等. 融合知识组织与认知计算的新一代开放知识服务架构探析[J]. 中国图书馆学报, 2019, 45(3): 38-48. 17 Xu Y H, Li M H, Cui L, et al. LayoutLM: pre-training of text and layout for document image understanding[C]// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2020: 1192-1200. 18 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3615-3620. 19 Chithrananda S, Grand G, Ramsundar B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction[OL]. (2020-10-23). https://arxiv.org/pdf/2010.09885.pdf. 20 Luo R Q, Sun L A, Xia Y C, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining[J]. Briefings in Bioinformatics, 2022, 23(6): bbac409. 21 Stokel-Walker C. Twitter changed science—what happens now it’s in turmoil?[J]. Nature, 2023, 613(7942): 19-21. 22 陆伟, 刘家伟, 马永强, 等. ChatGPT为代表的大模型对信息资源管理的影响[J]. 图书情报知识, 2023, 40(2): 6-9, 70. 23 王兴成. 科学经济学的对象[J]. 国外社会科学, 1982(1): 71-73. 24 约翰·齐曼. 元科学导论[M]. 刘珺珺, 译. 长沙: 湖南人民出版社, 1988. 25 迈克尔·吉本斯, 卡米那·利摩日, 黑尔佳·诺沃提尼, 等. 知识生产的新模式: 当代社会科学与研究的动力学[M]. 陈洪捷, 沈文钦, 译. 北京: 北京大学出版社, 2011. 26 Segler M H S, Preuss M, Waller M P. Planning chemical syntheses with deep neural networks and symbolic AI[J]. Nature, 2018, 555(7698): 604-610. 27 Melnikov A A, Nautrup H P, Krenn M, et al. Active learning machine learns to create new quantum experiments[J]. Proceedings of the National Academy of Sciences of the United States of America, 2018, 115(6): 1221-1226. 28 Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. 29 Nguyen T, Brandstetter J, Kapoor A, et al. ClimaX: a foundation model for weather and climate[OL]. (2023-07-10). https://arxiv.org/pdf/2301.10343.pdf. 30 Auer S, Oelen A, Haris M, et al. Improving access to scientific literature with knowledge graphs[J]. Bibliothek Forschung und Praxis, 2020, 44(3): 516-529. 31 Pankratius V, Li J, Gowanlock M, et al. Computer-aided discovery: toward scientific insight generation with machine support[J]. IEEE Intelligent Systems, 2016, 31(4): 3-10. 32 Pyzer-Knapp E O, Pitera J W, Staar P W J, et al. Accelerating materials discovery using artificial intelligence, high performance computing and robotics[J]. NPJ Computational Materials, 2022, 8: Article No.84. 33 Meyers F. IUPAC announces the 2020 top ten emerging technologies in chemistry[EB/OL]. (2020-10-25) [2022-02-02]. https://iupac.org/iupac-announces-the-2020-top-ten-emerging-technologies-in-chemistry/. 34 Johnson R, Watkinson A, Mabe M. The STM Report: an overview of scientific and scholarly publishing[R]. Fifth Edition. The Hague: International Association of Scientific, Technical and Medical Publishers, 2018. 35 van Noorden R. Scientists may be reaching a peak in reading habits[J]. Nature, 2014. DOI: 10.1038/nature.2014.14658. 36 Simon H A. The scientist as problem solver[M]// Complex Information Processing: The Impact of Herbert A. Simon. Hillsdale: Lawrence Erlbaum Associates, 1989: 375-398. 37 Hope T, Downey D, Weld D S, et al. A computational inflection for scientific discovery[J]. Communications of the ACM, 2023, 66(8): 62-73. 38 Krenn M, Pollice R, Guo S Y, et al. On scientific understanding with artificial intelligence[J]. Nature Reviews Physics, 2022, 4(12): 761-769. 39 马费成, 张帅. 我国图书情报领域新兴交叉学科发展探析[J]. 中国图书馆学报, 2023, 49(2): 4-14. 40 罗威, 罗准辰, 雷帅, 等. 智能科学家——科技信息创新引领的下一代科研范式[J]. 情报理论与实践, 2020, 43(1): 1-5, 17. 41 Li J, Huang J S, Liu J X, et al. Human-AI cooperation: modes and their effects on attitudes[J]. Telematics and Informatics, 2022, 73: 101862. 42 Leeming J. How AI is helping the natural sciences[J]. Nature, 2021, 598(7880): S5-S7. 43 Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[J]. Advances in Neural Information Processing Systems, 2022, 35: 27730-27744. 44 Chung H W, Hou L, Longpre S, et al. Scaling instruction-finetuned language models[OL]. (2022-12-06). https://arxiv.org/pdf/2210.11416.pdf. 45 Microsoft. The new Bing: our approach to responsible AI[R/OL]. (2023-02-01) [2023-03-19]. https://blogs.microsoft.com/wp-content/uploads/prod/sites/5/2023/02/The-new-Bing-Our-approach-to-Responsible-AI.pdf. 46 Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks[J]. Advances in Neural Information Processing Systems, 2020, 33: 9459-9474. 47 Stephen W. ChatGPT gets its “Wolfram superpowers”![EB/OL]. (2023-03-23) [2023-03-27]. https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/. 48 薛守义. 科学性质透视[M]. 济南: 山东人民出版社, 2009. 49 谢莹莹, 马鹏宇, 冯凡, 等. 2022 AI4S全球发展观察与展望[R]. 北京: 北京科学智能研究院, 深势科技, 2022. 50 Ma Y Q, Liu J W, Yi F, et al. AI vs. human—differentiation analysis of scientific content generation[OL]. (2023-02-12). https://arxiv.org/ftp/arxiv/papers/2301/2301.10416.pdf. |
|
|
|