|
|
Large Language Model Driven Academic Text Mining: Parameter-Efficient Fine-Tuning Strategy from the Tuning End |
Liu Yinpeng1,2, Lu Wei1,2, Shi Xiang1,2, Liu Jiawei1,2, Cheng Qikai1,2, Huang Yong1,2 |
1.School of Information Management, Wuhan University, Wuhan 430072 2.Institute of Intelligence and Innovation Governance, Wuhan University, Wuhan 430072 |
|
|
Abstract The ability to deeply understand academic texts has become a crucial support in intelligence work, and large language models (LLMs) have shown great potential in this area. LLMs can enhance knowledge extraction and utilization capabilities from both the inference end and tuning end. Currently, in academic text mining, various instruction engineering techniques at the inference end struggle to fully leverage the deep semantic understanding capabilities of LLMs. Therefore, adapting model parameters for domain-specific tasks using techniques such as parameter-efficient fine-tuning (PEFT) at the tuning end has become the key for LLMs to empower academic text mining. The performance and efficiency of applying different PEFT methods to LLMs have not yet been systematically explored. This study constructs a PEFT framework and evaluation system for academic text mining. It evaluates the performance metrics and cost-efficiency of seven instruction-tuned LLMs after applying seven PEFT methods, exploring the capability boundaries of PEFT strategies and instruction-tuned LLMs in academic text mining. The experiments demonstrate that, among the various tuning methods, fine-tuning achieves the best performance. However, its advantage is not significantly pronounced. By contrast, quantized low-rank adaptation (QLoRA) incurs the lowest computational cost, making it the most efficient PEFT method in terms of overall benefits. The performance differences following tuning across LLMs of varying sizes and architectures are minimal. Mistral-7B-Instruct-v0.1, which is smaller in scale, can achieve performance metrics comparable to those of models with 70B parameters when tuned with QLoRA. The LLMs show substantial improvements in performance across tasks such as citation function identification, scientific entity extraction, and scientific text reasoning, surpassing their performance on the instruction end by a significant margin. Compared with traditional deep learning models, LLMs in the tuning end comprehensively outperform in academic text reasoning tasks and perform similarly to smaller models in scientific entity extraction and citation function identification tasks. Therefore, LLMs perform better in tasks with higher difficulty, whereas small models are more beneficial for simpler sequence labeling and classification tasks.
|
Received: 10 October 2024
|
|
|
|
1 张智雄, 于改红, 刘熠, 等. ChatGPT对文献情报工作的影响[J]. 数据分析与知识发现, 2023, 7(3): 36-42. 2 陆伟, 刘寅鹏, 石湘, 等. 大模型驱动的学术文本挖掘——推理端指令策略构建及能力评测[J]. 情报学报, 2024, 43(8): 946-959. 3 曹树金, 曹茹烨. 从ChatGPT看生成式AI对情报学研究与实践的影响[J]. 现代情报, 2023, 43(4): 3-10. 4 Lester B, Al-Rfou R, Constant N. The power of scale for parameter-efficient prompt tuning[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 3045-3059. 5 Liu X, Zheng Y N, Du Z X, et al. GPT understands, too[J]. AI Open, 2024, 5: 208-215. 6 Li X L, Liang P. Prefix-tuning: optimizing continuous prompts for generation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 4582-4597. 7 Liu X, Ji K X, Fu Y C, et al. P-tuning: prompt tuning can be comparable to fine-tuning across scales and tasks[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 61-68. 8 Hu E J, Shen Y L, Wallis P, et al. LoRA: low-rank adaptation of large language models[C/OL]// Proceedings of the 2022 International Conference on Learning Representations. Appleton: ICLR, 2022. https://iclr.cc/virtual/2022/poster/6319. 9 Dettmers T, Pagnoni A, Holtzman A, et al. QLoRA: efficient finetuning of quantized LLMs[C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 10088-10115. 10 Ding N, Qin Y J, Yang G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220-235. 11 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014(9): 979-985. 12 陆伟, 马永强, 刘家伟, 等. 数智赋能的科研创新——基于数智技术的创新辅助框架探析[J]. 情报学报, 2023, 42(9): 1009-1017. 13 Liu Y P, Liu J W, Shi X, et al. Let’s learn step by step: enhancing in-context learning ability with curriculum learning[OL]. (2024-06-16). https://arxiv.org/pdf/2402.10738. 14 张恒, 赵毅, 章成志. 基于SciBERT与ChatGPT数据增强的研究流程段落识别[J]. 情报理论与实践, 2024, 47(1): 164-172, 153. 15 时宗彬, 朱丽雅, 乐小虬. 基于本地大语言模型和提示工程的材料信息抽取方法研究[J]. 数据分析与知识发现, 2024, 8(7): 23-31. 16 Kunnath S N, Pride D, Knoth P. Prompting strategies for citation classification[C]// Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2023: 1127-1137. 17 陈昱成, 韩涛. 生成式人工智能视角下研究问题与研究方法句生成研究——以高能物理领域为例[J]. 情报杂志, 2024, 43(10): 144-149, 143. 18 罗鹏程, 王继民, 聂磊. 基于生成式大语言模型的文献资源自动分类研究[J]. 情报理论与实践, 2024, 47(12): 174-182. 19 Touvron H, Martin L, Stone K, et al. Llama2: open foundation and fine-tuned chat models[OL]. (2023-07-19) [2024-04-30]. http://arxiv.org/pdf/2307.09288. 20 Zeng A H, Liu X, Du Z X, et al. GLM-130B: an open bilingual pre-trained model[OL]. (2023-10-25) [2024-04-30]. https://arxiv.org/pdf/2210.02414. 21 Jiang A Q, Sablayrolles A, Mensch A, et al. Mistral 7B[OL]. (2023-10-10) [2024-04-30]. http://arxiv.org/pdf/2310.06825. 22 Jiang A Q, Sablayrolles A, Roux A, et al. Mixtral of experts[OL]. (2024-01-08) [2024-04-30]. http://arxiv.org/pdf/2401.04088. 23 Lin C Y. ROUGE: a package for automatic evaluation of summaries[C]// Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. Stroudsburg: Association for Computational Linguistics, 2004: 74-81. 24 Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2002: 311-318. 25 Ma Y Q, Qing L Z, Liu J W, et al. From model-centered to human-centered: revision distance as a metric for text evaluation in LLMs-based applications[C]// Findings of the Association for Computational Linguistics: ACL 2024. Stroudsburg: Association for Computational Linguistics, 2024: 2127-2137. 26 Cornelius J, Lithgow-Serrano O, Mitrovic S, et al. BUST: benchmark for the evaluation of detectors of LLM-generated text[C]// Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2024: 8029-8057. 27 Cohan A, Ammar W, van Zuylen M, et al. Structural scaffolds for citation intent classification in scientific publications[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 3586-3596. 28 Luan Y, He L H, Ostendorf M, et al. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2018: 3219-3232. 29 Sadat M, Caragea C. SciNLI: a corpus for natural language inference on scientific text[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 7399-7409. 30 Eberts M, Ulges A. Span-based joint entity and relation extraction with transformer pre-training[C]// Proceedings of the 24th European Conference on Artificial Intelligence. Amsterdam: IOS Press, 2020: 2006-2013. 31 Wolf T, Debut L, Sanh V, et al. Transformers: state-of-the-art natural language processing[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg: Association for Computational Linguistics, 2020: 38-45. 32 Ren J, Rajbhandari S, Aminabadi R Y, et al. ZeRO-offload: democratizing billion-scale model training[C]// Proceedings of the 2021 USENIX Annual Technical Conference. Redmond: USENIX Association, 2021: 551-564. 33 Rajbhandari S, Ruwase O, Rasley J, et al. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning[C]// Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New York: ACM Press, 2021: 1-14. 34 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 35 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3613-3618. 36 Mercier D, Rizvi S T R, Rajashekar V, et al. ImpactCite: an XLNet-based method for citation impact analysis[OL]. (2020-05-05) [2024-04-30]. https://arxiv.org/pdf/2005.06611. 37 Yang Z L, Dai Z H, Yang Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Proceedings of the 32nd Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2020: 5730-5740. |
|
|
|