基于大语言模型的成语隐喻式构词方法及其应用：知识重组、回溯与发现

doi:10.3772/j.issn.1000-0135.2025.09.002

情报学报

2025, Vol. 44

Issue (9): 1083-1098 DOI: 10.3772/j.issn.1000-0135.2025.09.002

情报理论与方法

本期目录 | 过刊浏览 | 高级检索

基于大语言模型的成语隐喻式构词方法及其应用：知识重组、回溯与发现

张卫^1,2, 王东波^1,2, 刘浏^1,2

1.南京农业大学信息管理学院，南京 210095
2.南京农业大学人文与社会计算研究中心，南京 210095

An Idiomatic Metaphorical Word-Formation Method Based on a Large Language Model and Its Application: Knowledge Reorganization, Backtracking, and Discovery

Zhang Wei^1,2, Wang Dongbo^1,2, Liu Liu^1,2

1.College of Information Management, Nanjing Agricultural University, Nanjing 210095
2.Research Center for Humanities and Social Computing, Nanjing Agricultural University, Nanjing 210095

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (7297 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要数智时代下，生成式人工智能（generative artificial intelligence，GenAI）为传统人文知识组织、挖掘与生产赋予了新动能。利用人工智能生成内容（artificial intelligence generated content，AIGC）范式将古人对典籍文献进行的成语摘引、沿用、固化等信息行为重塑为智能化构词模式，对于现有人文知识体系的结构重组、历史回溯、概念发现具有重要意义。本文从文化基因理论与构词法视角提出一套基于大语言模型的成语隐喻式构词方法。首先，面向成语出处定义<短语结构，物象标签（源域），情感标签（目标域）>的隐喻式构词知识体系，利用“出处-构词体系”对照语料构造问答数据集；其次，引入生成式大模型进行短语抽取、隐喻识别的成语构词多任务学习，并重点探索依存句法知识注入下构词大模型指令微调的增强效果。研究发现，训练后的大模型能够面向成语出处文本实现隐喻式构词结构的有效生成，“荀子”模型在多个任务的各项指标上均优于qwen7b、llama3_8b、GPT-4o等通用大模型；依存句法知识能够有效激发大模型理解能力，使得词汇结构、物象标签、情感标签识别准确率分别进一步提升至86.11%、87.82%、85.39%。以《全唐诗》为例展开大模型数字人文应用可知，诗句内的成语识别可实现“成语—诗歌—诗人”链式知识重组，大模型生成结果的时间序列分析实现了130个成语出处的知识回溯（最多向前回溯1000余年），并在成语隐喻文化基因继承下完成了大规模新短语的知识发现，编纂出具有文化产业实践价值的意象主题词表。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张卫
	王东波
	刘浏

关键词 ：数字人文, 大语言模型, 成语隐喻, 构词法, 短语抽取

收稿日期: 2024-12-15

基金资助:国家自然科学基金青年科学基金项目“事件情感知识关联驱动下文化遗迹数字记忆重构模式研究”（72404131）；国家社会科学基金重大项目“中国古代典籍跨语言知识库构建及应用研究”（21&ZD331）；教育部人文社会科学青年基金项目“基于知识重组的古诗典故隐喻识别与人文计算研究”（24YJC870016）；南京农业大学中央高校基本科研业务费人文社会科学研究基金项目“计算人文视角下古诗意象叙事结构语义解析与知识关联研究”（SKYC2024019）。

作者简介: 张卫，男，1994年生，博士，助理研究员，主要研究领域为知识抽取与本体学习、数字人文与情感计算；王东波，通信作者，男，1981年生，博士，教授，博士生导师，主要研究领域为自然语言处理与文本挖掘、信息计量，E-mail：db.wang@njau.edu.cn；刘浏，男，1989年生，博士，副教授，硕士生导师，主要研究领域为计算人文、文本知识挖掘；

引用本文:

张卫, 王东波, 刘浏. 基于大语言模型的成语隐喻式构词方法及其应用：知识重组、回溯与发现[J]. 情报学报, 2025, 44(9): 1083-1098.
Zhang Wei, Wang Dongbo, Liu Liu. An Idiomatic Metaphorical Word-Formation Method Based on a Large Language Model and Its Application: Knowledge Reorganization, Backtracking, and Discovery. 情报学报, 2025, 44(9): 1083-1098.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2025.09.002 或 https://qbxb.istic.ac.cn/CN/Y2025/V44/I9/1083

1 朱丽. 动态构词标引研究[J]. 情报学报, 1998, 17(3): 226-229.
2 张晓雨. 四字格网络新成语的构词方式探析[J]. 语文学刊, 2016(11): 75-77, 88.
3 Wu M M, Hu Y X, Zhang Y C, et al. Mitigating idiom inconsistency: a multi-semantic contrastive learning method for Chinese idiom reading comprehension[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 19243-19251.
4 张卫, 王昊, 陈玥彤, 等. 融合迁移学习与文本增强的中文成语隐喻知识识别与关联研究[J]. 数据分析与知识发现, 2022, 6(S1): 167-183.
5 Lund B D, Wang T, Mannuru N R, et al. ChatGPT and a new academic reality: artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing[J]. Journal of the Association for Information Science and Technology, 2023, 74(5): 570-581.
6 井红静. 浅析汉语成语中的概念隐喻现象[J]. 今古文创, 2024(5): 122-124.
7 伦昕煜, 张雪. 梅兰竹菊类成语数量搭配与分形隐喻探析[J]. 语言与翻译, 2022(1): 49-54.
8 Wu T N. Metaphors and culturally unique idioms of eating and drinking in Mongolian[J]. Language and Cognition, 2023, 15(1): 173-214.
9 Di F F. The metaphorical interpretation of English and Chinese body-part idioms based on relevance theory[J]. Journal of Language Teaching and Research, 2021, 12(5): 837-843.
10 宗小飞, 吴世雄. 《诗经》隐喻性成语的历时语义演变[J]. 外国语言文学, 2010, 27(4): 236-240.
11 王雅琪. 汉英习语中太阳隐喻异同的认知文化语境阐释[J]. 外文研究, 2023, 11(3): 19-25, 105-106.
12 胡雪婵, 吴长安. 汉语成语语义韵的演变论略[J]. 汉语学习, 2016(5): 65-76.
13 Zhang W, Wang H, Song M, et al. A method of constructing a fine-grained sentiment lexicon for the humanities computing of classical Chinese poetry[J]. Neural Computing and Applications, 2023, 35(3): 2325-2346.
14 Williams L, Bannister C, Arribas-Ayllon M, et al. The role of idioms in sentiment analysis[J]. Expert Systems with Applications, 2015, 42(21): 7375-7385.
15 Abebe Fenta A. Vector representation of Amharic idioms for natural language processing applications using machine learning approach[J]. Machine Learning Research, 2023, 8(2): 17-22.
16 Dashtipour K, Gogate M, Gelbukh A, et al. Extending Persian sentiment lexicon with idiomatic expressions for sentiment analysis[J]. Social Network Analysis and Mining, 2022, 12: Article No.9.
17 Bruening B. Word formation is syntactic: adjectival passives in English[J]. Natural Language & Linguistic Theory, 2014, 32(2): 363-422.
18 刘璐, 亢世勇. 基于物性结构的无向型名词语义构词研究——以汉语同义类语素双音节合成词为例[J]. 中文信息学报, 2017, 31(4): 1-8, 19.
19 康司辰, 刘扬. 基于语义构词的汉语词语语义相似度计算[J]. 中文信息学报, 2017, 31(1): 94-101, 111.
20 郑婳, 刘扬, 殷雅琦, 等. 基于词信息嵌入的汉语构词结构识别研究[J]. 中文信息学报, 2022, 36(5): 31-40, 66.
21 Shang F J, Ran C F. An entity recognition model based on deep learning fusion of text feature[J]. Information Processing & Management, 2022, 59(2): 102841.
22 陈正瑜. 汉语叙词构词法的研究[J]. 情报理论与实践, 1996, 19(5): 16-19.
23 张卫, 王昊, 邓三鸿, 等. 面向数字人文的古诗文本情感术语抽取与应用研究[J]. 中国图书馆学报, 2021, 47(4): 113-131.
24 翟姗姗, 余华娟, 陈健瑶, 等. 基于多维特征分析的戏曲类方志文献命名实体识别研究[J]. 情报学报, 2024, 43(9): 1094-1104.
25 刘清民, 王芳, 黄梅银. 我国人工智能政策新词发现与演化研究——一个多特征融合的算法[J]. 现代情报, 2024, 44(6): 18-32, 58.
26 王烨. 汉语列举式并列组合结构的界定及语法单位归属——兼谈与成语、惯用语之关系[J]. 汉字文化, 2021(1): 44-48, 53.
27 李羽涵. 反义共现成语的内部语义结构研究[J]. 外文研究, 2024, 12(2): 19-26, 105-106.
28 张宇轩. 网络缩略语构词探析[J]. 今古文创, 2022(1): 123-125.
29 Noh H, Jo Y, Lee S. Keyword selection and processing strategy for applying text mining to patent analysis[J]. Expert Systems with Applications, 2015, 42(9): 4348-4360.
30 Rafiei-Asl J, Nickabadi A. TSAKE: a topical and structural automatic keyphrase extractor[J]. Applied Soft Computing, 2017, 58: 620-630.
31 俞琰, 王丽, 郑斯煜. 融入术语与层级信息的专利关键短语抽取方法研究[J]. 数据分析与知识发现, 2023, 7(6): 99-112.
32 Liang X N, Wu S Z, Li M, et al. Unsupervised keyphrase extraction by jointly modeling local and global context[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association of Computational Linguistics, 2021: 155-164.
33 Zhang C Z, Wang H L, Liu Y, et al. Automatic keyword extraction from documents using conditional random fields[J]. Journal of Computational Information Systems, 2008, 4(3): 1169-1180.
34 Xie B B, Song J, Shao L Y, et al. From statistical methods to deep learning, automatic keyphrase prediction: a survey[J]. Information Processing & Management, 2023, 60(4): 103382.
35 Song M Y, Jing L P, Xiao L. Importance estimation from multiple perspectives for keyphrase extraction[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association of Computational Linguistics, 2021: 2726-2736.
36 Yan X Y, Zhang Y Y, Zhang C Z. Utilizing cognitive signals generated during human reading to enhance keyphrase extraction from microblogs[J]. Information Processing & Management, 2024, 61(2): 103614.
37 周树斌, 高劲松, 张强, 等. 文化基因视域下诗词资源多维知识重组与可视化研究——以茶文化为例[J]. 图书情报工作, 2023, 67(16): 111-123.
38 王彦莹, 王昊, 朱惠, 等. 基于文本生成技术的历史古籍事件识别模型构建研究[J]. 图书情报工作, 2023, 67(3): 119-130.
39 Lu Y J, Lin H Y, Xu J, et al. Text2Event: controllable sequence-to-structure generation for end-to-end event extraction[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association of Computational Linguistics, 2021: 2795-2806.
40 Li Z, Cai J, He S, et al. Seq2seq dependency parsing[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg: Association of Computational Linguistics, 2018: 3203-3214.
41 Hu E J, Shen Y L,Wallis P, et al. LoRA: low-rank adaptation of large language models[C/OL]// Proceedings of theInternational Conference on Learning Representations.Appleton: ICLR, 2022. https://iclr.cc/virtual/2022/poster/6319.
42 Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association of Computational Linguistics, 2002: 311-318.
43 Lin C Y. ROUGE: a package for automatic evaluation of summaries[C]// Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. Stroudsburg: Association of Computational Linguistics, 2004: 74-81.
44 Zhang T Y, Kishore V, Wu F, et al. BERTScore: evaluating text generation with BERT[C/OL]// Proceedings of theInternational Conference on Learning Representations. Appleton: ICLR,2020. https://iclr.cc/virtual_2020/poster_SkeHuCVFDr.html.
45 Che W, Li Z, Liu T. LTP: a Chinese language technology platform[C]// Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Stroudsburg: Association of Computational Linguistics, 2010: 13-16. 责任编辑魏瑞斌）