基于富语义词元的大模型生成策略优化

doi:10.3772/j.issn.1000-0135.2025.06.009

情报学报

2025, Vol. 44

Issue (6): 761-782 DOI: 10.3772/j.issn.1000-0135.2025.06.009

情报技术与应用

本期目录 | 过刊浏览 | 高级检索

基于富语义词元的大模型生成策略优化

程齐凯^1,2, 石湘^1,2, 于丰畅^1,2, 黄圣智^1,2

1.武汉大学信息管理学院，武汉 430072
2.武汉大学智能与创新治理研究所，武汉 430072

Optimization of LLM's Generation Strategies Based on Rich Semantic Tokens

Cheng Qikai^1,2, Shi Xiang^1,2, Yu Fengchang^1,2, Huang Shengzhi^1,2

1.School of Information Management, Wuhan University, Wuhan 430072
2.Institute of Intelligence and Innovation Governance, Wuhan University, Wuhan 430072

摘要
图/表
参考文献
相关文章 (0)

全文: PDF (4934 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要近年来，通用大模型技术取得了显著进展，但在科技情报领域的应用中，仍面临推理效率低下和领域任务适配性不足的问题。为此，本文系统地分析了大模型的生成机制，并提出了“富语义词元”概念，旨在描述大模型在生成过程中倾向产生的、具有语义聚合性、上下文依赖性或任务相关性的词元或词元序列。基于该概念，本文设计了一种基于生成偏好的大小模型协同生成策略。通过富语义词元的挖掘、复制机制及动态验证策略，实现在小模型与大模型之间的协同作用，推动大模型由逐词元生成向多个词元同步生成的转变，从而提升生成效率与任务适配性。本文从生成性能、泛用性和生成效率三个维度对该生成优化策略进行了评估。研究结果表明，该策略在法律、医学和新闻百科等多个领域任务中的评估指标均优于传统生成优化方法。本文为大模型生成优化、任务适配性提升以及可信可靠大模型构建提供了新的理论基础和实践路径。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	程齐凯
	石湘
	于丰畅
	黄圣智

关键词 ：富语义词元, 大小模型协同, 生成优化, 动态投机采样

收稿日期: 2024-11-24

基金资助:新一代人工智能国家科技重大专项项目“高可靠科技文献智能引擎关键技术研发与示范应用”（2023ZD0121500）；国家自然科学基金面上项目“基于机器阅读理解的科学命题文本论证逻辑识别”（72174157）。

作者简介: 程齐凯，男，1989年生，博士，副教授，博士生导师，研究方向为文本挖掘、信息检索，E-mail：chengqikai@whu.edu.cn；石湘，男，1998年生，博士研究生，研究方向为文本挖掘、文档智能；于丰畅，男，1990年生，博士，研究方向为信息抽取、机器学习；黄圣智，男，1995年生，博士，研究方向为科学计量、创新评价；

引用本文:

程齐凯, 石湘, 于丰畅, 黄圣智. 基于富语义词元的大模型生成策略优化[J]. 情报学报, 2025, 44(6): 761-782.
Cheng Qikai, Shi Xiang, Yu Fengchang, Huang Shengzhi. Optimization of LLM's Generation Strategies Based on Rich Semantic Tokens. 情报学报, 2025, 44(6): 761-782.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2025.06.009 或 https://qbxb.istic.ac.cn/CN/Y2025/V44/I6/761

1 陆伟, 刘家伟, 马永强, 等. ChatGPT为代表的大模型对信息资源管理的影响[J]. 图书情报知识, 2023, 40(2): 6-9, 70.
2 Zhou Z X, Ning X F, Hong K, et al. A survey on efficient inference for large language models[OL]. (2024-07-19). https://arxiv.org/pdf/2404.14294.
3 Li Z Q, Liu Y H, Su Y X, et al. Prompt compression for large language models: a survey[C]// Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2025: 7182-7195.
4 Gao Y F, Xiong Y, Gao X Y, et al. Retrieval-augmented generation for large language models: a survey[OL]. (2024-03-27). https://arxiv.org/pdf/2312.10997.
5 Sreenivas S T, Muralidharan S, Joshi R, et al. LLM pruning and distillation in practice: the minitron approach[OL]. (2024-12-09). https://arxiv.org/pdf/2408.11796.
6 Dettmers T, Pagnoni A, Holtzman A, et al. QLoRA: efficient finetuning of quantized LLMs[C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2023: 10088-10115.
7 Jiang A Q, Sablayrolles A, Roux A, et al. Mixtral of experts[OL]. (2024-01-08). https://arxiv.org/pdf/2401.04088.
8 Kwon W, Li Z H, Zhuang S Y, et al. Efficient memory management for large language model serving with PagedAttention[C]// Proceedings of the 29th Symposium on Operating Systems Principles. New York: ACM Press, 2023: 611-626.
9 陆伟, 刘寅鹏, 石湘, 等. 大模型驱动的学术文本挖掘——推理端指令策略构建及能力评测[J]. 情报学报, 2024, 43(8): 946-959.
10 赵浜, 曹树金. 国内外生成式AI大模型执行情报领域典型任务的测试分析[J]. 情报资料工作, 2023, 44(5): 6-17.
11 钱力, 张智雄, 伍大勇, 等. 科技文献大模型: 方法、框架与应用[J]. 中国图书馆学报, 2024, 50(6): 45-58.
12 OpenAI. GPT-4 technical report[OL]. (2024-03-04). https://arxiv.org/pdf/2303.08774.
13 Touvron H, Lavril T, Izacard G, et al. LLaMA: open and efficient foundation language models[OL]. (2023-02-27). https://arxiv.org/pdf/2302.13971.
14 Yang A, Yang B S, Hui B Y, et al. Qwen2 technical report[R/OL]. (2024-09-10). https://arxiv.org/pdf/2407.10671.
15 Liu H T, Li C Y, Wu Q Y, et al. Visual instruction tuning[C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2023: 34892-34916.
16 Bai J Z, Bai S, Yang S S, et al. Qwen-VL: a versatile vision-language model for understanding, localization, text reading, and beyond[OL]. (2023-10-13). https://arxiv.org/pdf/2308.12966.
17 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 6000-6010.
18 Wu C F, Yin S M, Qi W Z, et al. Visual ChatGPT: talking, drawing and editing with visual foundation models[OL]. (2023-03-08). https://arxiv.org/pdf/2303.04671.
19 Schick T, Dwivedi-Yu J, Dessì R, et al. Toolformer: language models can teach themselves to use tools[C]// Proceedings of the Advances in Neural Information Processing Systems. Red Hook: Curran Associates, 2023: 68539-68551.
20 Leviathan Y, Kalman M, Matias Y. Fast inference from transformers via speculative decoding[J]. Proceedings of Machine Learning Research, 2023, 202: 19274-19286.
21 Chen C, Borgeaud S, Irving G, et al. Accelerating large language model decoding with speculative sampling[OL]. (2023-02-03). https://arxiv.org/pdf/2302.01318.
22 Zhou Y C, Lyu K F, Rawat A S, et al. DistillSpec: improving speculative decoding via knowledge distillation[OL]. (2024-03-31). https://arxiv.org/pdf/2310.08461.
23 Du C X, Jiang J, Xu Y C, et al. GliDe with a CaPE: a low-hassle method to accelerate speculative decoding[OL]. (2024-02-03). https://arxiv.org/pdf/2402.02082.
24 He Z Y, Zhong Z X, Cai T L, et al. REST: retrieval-based speculative decoding[C]// Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2024: 1582-1595.
25 Ou J, Chen Y M, Tian W H. Lossless acceleration of large language model via adaptive n-gram parallel decoding[C]// Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2024: 10-22.
26 Cai T L, Li Y H, Geng Z Y, et al. Medusa: simple LLM inference acceleration framework with multiple decoding heads[J]. Proceedings of Machine Learning Research, 2024, 235: 5209-5235.
27 Xia H M, Ge T, Wang P Y, et al. Speculative decoding: exploiting speculative execution for accelerating seq2seq generation[C]// Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: Association for Computational Linguistics, 2023: 3909-3925.
28 Kim S, Mangalam K, Moon S, et al. Speculative decoding with big little decoder[C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2023: 39236-39256.
29 Lan T, Cai D, Wang Y, et al. Copy is all you need[C]// Proceedings of the Eleventh International Conference on Learning Representations. Appleton: ICLR, 2023: 1-16.
30 Cohen N, Kalinsky O, Ziser Y, et al. WikiSum: coherent summarization dataset for efficient human-evaluation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 212-219.
31 Fei Z W, Shen X Y, Zhu D W, et al. LawBench: benchmarking legal knowledge of large language models[C]// Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2024: 7933-7962.
32 Ben Abacha A, Demner-Fushman D. On the summarization of consumer health questions[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 2228-2234.
33 Chen W, Li Z W, Fang H Y, et al. A benchmark for automatic medical consultation system: frameworks, tasks and datasets[J]. Bioinformatics, 2023, 39(1): btac817.