|
|
Large Language Model-Driven Academic Text Mining: Construction and Evaluation of Inference-End Prompting Strategy |
Lu Wei1,2, Liu Yinpeng1,2, Shi Xiang1,2, Liu Jiawei1,2, Cheng Qikai1,2, Huang Yong1,2, Wang Lei1,2 |
1.School of Information Management, Wuhan University, Wuhan 430072 2.Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072 |
|
|
Abstract Task comprehension and instruction-following abilities of large language models enable users to complete complex information-processing tasks through simple interactive instructions. Scientific literature analysts are actively exploring the application of large language models; however, a systematic study of the capability boundaries of large models has not yet been conducted. Focusing on academic text mining, this study designs inference-end prompting strategies and establishes a comprehensive evaluation framework for large language model-driven academic text mining, encompassing text classification, information extraction, text reasoning, and text generation, covering six tasks in total. Mainstream instruction-tuned models were selected for the experiments, to compare the different prompting strategies and professional capabilities of the models. The experiments indicate that complex instruction strategies, such as few-shot and chain-of-thought, are not effective in classification tasks, but perform well in more challenging tasks, such as extraction and generation, whereby trillion-parameter scale models achieve results comparable to those of fully trained deep-learning models. However, for models with billions or tens of billions of parameter scales, there is a clear upper limit to inference-end instruction strategies. Achieving deep integration of large language models into the field of scientific intelligence requires adaption of the model to the domain at the tuning end.
|
Received: 08 January 2024
|
|
|
|
1 张智雄, 于改红, 刘熠, 等. ChatGPT对文献情报工作的影响[J]. 数据分析与知识发现, 2023, 7(3): 36-42. 2 Huang S Z, Qian J J, Huang Y, et al. Disclosing the relationship between citation structure and future impact of a publication[J]. Journal of the Association for Information Science and Technology, 2022, 73(7): 1025-1042. 3 王鑫, 程齐凯, 马永强, 等. 基于层次注意力网络的论证区间识别研究[J]. 情报工程, 2020, 6(3): 52-62. 4 程齐凯, 李信, 陆伟. 领域无关学术文献词汇功能标准化数据集构建及分析[J]. 情报科学, 2019, 37(7): 41-47. 5 Ma Y Q, Liu J W, Lu W, et al. From “what” to “how”: extracting the procedural scientific information toward the metric-optimization in AI[J]. Information Processing & Management, 2023, 60(3): 103315. 6 陆伟, 马永强, 刘家伟, 等. 数智赋能的科研创新——基于数智技术的创新辅助框架探析[J]. 情报学报, 2023, 42(9): 1009-1017. 7 陆伟, 汪磊, 程齐凯, 等. 数智赋能信息资源管理新路径: 指令工程的概念、内涵和发展[J]. 图书情报知识, 2024, 41(1): 6-11. 8 Ding N, Qin Y J, Yang G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220-235. 9 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 6000-6010. 10 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 11 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329. 12 Jiang Y, Meng R, Huang Y, et al. Generating keyphrases for readers: a controllable keyphrase generation framework[J]. Journal of the Association for Information Science and Technology, 2023, 74(7): 759-774. 13 Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 1877-1901. 14 Liu J W, Xiong Z, Jiang Y, et al. Low-resource multi-granularity academic function recognition based on multiple prompt knowledge[OL]. (2023-05-05) [2023-12-25]. https://arxiv.org/pdf/2305.03287. 15 陆伟, 刘家伟, 马永强, 等. ChatGPT为代表的大模型对信息资源管理的影响[J]. 图书情报知识, 2023, 40(2): 6-9, 70. 16 张恒, 赵毅, 章成志. 基于SciBERT与ChatGPT数据增强的研究流程段落识别[J]. 情报理论与实践, 2024, 47(1): 164-172, 153. 17 Wang Q Y, Downey D, Ji H, et al. Learning to generate novel scientific directions with contextualized literature-based discovery[OL]. (2023-10-12) [2023-12-25]. https://arxiv.org/pdf/2305.14259v3. 18 Zhao W X, Zhou K, Li J Y, et al. A survey of large language models[OL]. (2023-11-24) [2023-12-25]. https://arxiv.org/pdf/2303.18223. 19 Kojima T, Gu S S, Reid M, et al. Large language models are zero-shot reasoners[J]. Advances in Neural Information Processing Systems, 2022, 35: 22199-22213. 20 Wei J, Wang X Z, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]// Proceedings of the 36th Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2022: 24824-24837. 21 Qin C W, Zhang A, Zhang Z S, et al. Is ChatGPT a general-purpose natural language processing task solver?[C]// Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2023: 1339-1384. 22 Nori H, Lee Y T, Zhang S, et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine[OL]. (2023-11-28) [2023-12-25]. https://arxiv.org/pdf/2311.16452. 23 张颖怡, 章成志, 周毅, 等. 基于ChatGPT的多视角学术论文实体识别: 性能测评与可用性研究[J]. 数据分析与知识发现, 2023, 7(9): 12-24. 24 Suzgun M, Scales N, Sch?rli N, et al. Challenging BIG-bench tasks and whether chain-of-thought can solve them[C]// Proceedings of the 61st Annual Conference of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2023: 13003-13051. 25 Dernoncourt F, Lee J Y. PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts[C]// Proceedings of the 8th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 308-313. 26 王佳敏, 陆伟, 刘家伟, 等. 多层次融合的学术文本结构功能识别研究[J]. 图书情报工作, 2019, 63(13): 95-104. 27 Bird S, Dale R, Dorr B J, et al. The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics[C]// Proceedings of the Sixth International Conference on Language Resources and Evaluation. Stroudsburg: Association for Computational Linguistics, 2008: 1755-1759. 28 Luan Y, He L H, Ostendorf M, et al. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2018: 3219-3232. 29 Sadat M, Caragea C. SciNLI: a corpus for natural language inference on scientific text[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 7399-7409. 30 Taori R, Gulrajani I, Zhang T Y, et al. Alpaca: a strong, replicable instruction-following model[EB/OL]. [2023-07-18]. https://crfm.stanford.edu/2023/03/13/alpaca.html. 31 Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2022: 27730-27744. 32 Du Z X, Qian Y J, Liu X, et al. GLM: general language model pretraining with autoregressive blank infilling[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 320-335. 33 Sun Y, Wang S H, Feng S K, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation[OL]. (2021-07-05) [2023-07-18]. https://arxiv.org/pdf/2107.02137. 34 Cohan A, Ammar W, van Zuylen M, et al. Structural scaffolds for citation intent classification in scientific publications[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 3586-3596. |
|
|
|