|
|
An Idiomatic Metaphorical Word-Formation Method Based on a Large Language Model and Its Application: Knowledge Reorganization, Backtracking, and Discovery |
Zhang Wei1,2, Wang Dongbo1,2, Liu Liu1,2 |
1.College of Information Management, Nanjing Agricultural University, Nanjing 210095 2.Research Center for Humanities and Social Computing, Nanjing Agricultural University, Nanjing 210095 |
|
|
Abstract In the age of digital intelligence, generative artificial intelligence (GenAI) has provided new impetus to traditional humanistic knowledge organization, mining, and production. Using the artificial intelligence generated content (AIGC) paradigm to reshape the information behaviors of idiomatic excerpts, followings, and curing of ancient literature into an intelligent word-formation mode is of great significance for the structural reorganization, historical retrospection, and conceptual discovery of existing humanities knowledge systems. This study proposes a metaphorical word formation method for idioms based on a large language model (LLM) from the perspective of cultural genes and word formation. First, a metaphorical word-formation knowledge system is defined based on the origins of idioms, including phrases, objects (source domains), and emotions (target domains). A QA dataset is constructed using the “source text and word formation system” corpus. Subsequently, a generative LLM is introduced for keyphrase extraction and metaphor recognition in multi-task learning of word formation of idioms, with a focus on exploring the enhancement effect of instruction fine-tuning in the word-formation LLM under the injection of dependency syntactic knowledge. The trained LLM can effectively generate metaphorical word formation structures for idiom source texts. The Xunzi model outperforms general LLMs such as qwen7b, llama3_8b, and GPT-4o in various indicators across differenttasks. Dependency syntax knowledge can effectively stimulate the understanding ability of the LLM, increasing the accuracy of vocabulary structure, object label, and emotion label recognition to 86.11%, 87.82%, and 85.39%, respectively. Considering “Tang Dynasty Poetry” as an example, it can be observed that the recognition of idioms in poems can achieve a chain knowledge reorganization of idioms, poems, and poets. The time-series analysis of the results generated by the LLM allowsthe origin tracing of 130 idioms (up to more than 1000 years forward) and completes the knowledge discovery of large-scale new phrases under the inheritance of idiom metaphorical cultural genes, compiling a thematic vocabulary of imagery with practical value in the cultural industry.
|
Received: 15 December 2024
|
|
|
|
1 朱丽. 动态构词标引研究[J]. 情报学报, 1998, 17(3): 226-229. 2 张晓雨. 四字格网络新成语的构词方式探析[J]. 语文学刊, 2016(11): 75-77, 88. 3 Wu M M, Hu Y X, Zhang Y C, et al. Mitigating idiom inconsistency: a multi-semantic contrastive learning method for Chinese idiom reading comprehension[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 19243-19251. 4 张卫, 王昊, 陈玥彤, 等. 融合迁移学习与文本增强的中文成语隐喻知识识别与关联研究[J]. 数据分析与知识发现, 2022, 6(S1): 167-183. 5 Lund B D, Wang T, Mannuru N R, et al. ChatGPT and a new academic reality: artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing[J]. Journal of the Association for Information Science and Technology, 2023, 74(5): 570-581. 6 井红静. 浅析汉语成语中的概念隐喻现象[J]. 今古文创, 2024(5): 122-124. 7 伦昕煜, 张雪. 梅兰竹菊类成语数量搭配与分形隐喻探析[J]. 语言与翻译, 2022(1): 49-54. 8 Wu T N. Metaphors and culturally unique idioms of eating and drinking in Mongolian[J]. Language and Cognition, 2023, 15(1): 173-214. 9 Di F F. The metaphorical interpretation of English and Chinese body-part idioms based on relevance theory[J]. Journal of Language Teaching and Research, 2021, 12(5): 837-843. 10 宗小飞, 吴世雄. 《诗经》隐喻性成语的历时语义演变[J]. 外国语言文学, 2010, 27(4): 236-240. 11 王雅琪. 汉英习语中太阳隐喻异同的认知文化语境阐释[J]. 外文研究, 2023, 11(3): 19-25, 105-106. 12 胡雪婵, 吴长安. 汉语成语语义韵的演变论略[J]. 汉语学习, 2016(5): 65-76. 13 Zhang W, Wang H, Song M, et al. A method of constructing a fine-grained sentiment lexicon for the humanities computing of classical Chinese poetry[J]. Neural Computing and Applications, 2023, 35(3): 2325-2346. 14 Williams L, Bannister C, Arribas-Ayllon M, et al. The role of idioms in sentiment analysis[J]. Expert Systems with Applications, 2015, 42(21): 7375-7385. 15 Abebe Fenta A. Vector representation of Amharic idioms for natural language processing applications using machine learning approach[J]. Machine Learning Research, 2023, 8(2): 17-22. 16 Dashtipour K, Gogate M, Gelbukh A, et al. Extending Persian sentiment lexicon with idiomatic expressions for sentiment analysis[J]. Social Network Analysis and Mining, 2022, 12: Article No.9. 17 Bruening B. Word formation is syntactic: adjectival passives in English[J]. Natural Language & Linguistic Theory, 2014, 32(2): 363-422. 18 刘璐, 亢世勇. 基于物性结构的无向型名词语义构词研究——以汉语同义类语素双音节合成词为例[J]. 中文信息学报, 2017, 31(4): 1-8, 19. 19 康司辰, 刘扬. 基于语义构词的汉语词语语义相似度计算[J]. 中文信息学报, 2017, 31(1): 94-101, 111. 20 郑婳, 刘扬, 殷雅琦, 等. 基于词信息嵌入的汉语构词结构识别研究[J]. 中文信息学报, 2022, 36(5): 31-40, 66. 21 Shang F J, Ran C F. An entity recognition model based on deep learning fusion of text feature[J]. Information Processing & Management, 2022, 59(2): 102841. 22 陈正瑜. 汉语叙词构词法的研究[J]. 情报理论与实践, 1996, 19(5): 16-19. 23 张卫, 王昊, 邓三鸿, 等. 面向数字人文的古诗文本情感术语抽取与应用研究[J]. 中国图书馆学报, 2021, 47(4): 113-131. 24 翟姗姗, 余华娟, 陈健瑶, 等. 基于多维特征分析的戏曲类方志文献命名实体识别研究[J]. 情报学报, 2024, 43(9): 1094-1104. 25 刘清民, 王芳, 黄梅银. 我国人工智能政策新词发现与演化研究——一个多特征融合的算法[J]. 现代情报, 2024, 44(6): 18-32, 58. 26 王烨. 汉语列举式并列组合结构的界定及语法单位归属——兼谈与成语、惯用语之关系[J]. 汉字文化, 2021(1): 44-48, 53. 27 李羽涵. 反义共现成语的内部语义结构研究[J]. 外文研究, 2024, 12(2): 19-26, 105-106. 28 张宇轩. 网络缩略语构词探析[J]. 今古文创, 2022(1): 123-125. 29 Noh H, Jo Y, Lee S. Keyword selection and processing strategy for applying text mining to patent analysis[J]. Expert Systems with Applications, 2015, 42(9): 4348-4360. 30 Rafiei-Asl J, Nickabadi A. TSAKE: a topical and structural automatic keyphrase extractor[J]. Applied Soft Computing, 2017, 58: 620-630. 31 俞琰, 王丽, 郑斯煜. 融入术语与层级信息的专利关键短语抽取方法研究[J]. 数据分析与知识发现, 2023, 7(6): 99-112. 32 Liang X N, Wu S Z, Li M, et al. Unsupervised keyphrase extraction by jointly modeling local and global context[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association of Computational Linguistics, 2021: 155-164. 33 Zhang C Z, Wang H L, Liu Y, et al. Automatic keyword extraction from documents using conditional random fields[J]. Journal of Computational Information Systems, 2008, 4(3): 1169-1180. 34 Xie B B, Song J, Shao L Y, et al. From statistical methods to deep learning, automatic keyphrase prediction: a survey[J]. Information Processing & Management, 2023, 60(4): 103382. 35 Song M Y, Jing L P, Xiao L. Importance estimation from multiple perspectives for keyphrase extraction[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association of Computational Linguistics, 2021: 2726-2736. 36 Yan X Y, Zhang Y Y, Zhang C Z. Utilizing cognitive signals generated during human reading to enhance keyphrase extraction from microblogs[J]. Information Processing & Management, 2024, 61(2): 103614. 37 周树斌, 高劲松, 张强, 等. 文化基因视域下诗词资源多维知识重组与可视化研究——以茶文化为例[J]. 图书情报工作, 2023, 67(16): 111-123. 38 王彦莹, 王昊, 朱惠, 等. 基于文本生成技术的历史古籍事件识别模型构建研究[J]. 图书情报工作, 2023, 67(3): 119-130. 39 Lu Y J, Lin H Y, Xu J, et al. Text2Event: controllable sequence-to-structure generation for end-to-end event extraction[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association of Computational Linguistics, 2021: 2795-2806. 40 Li Z, Cai J, He S, et al. Seq2seq dependency parsing[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg: Association of Computational Linguistics, 2018: 3203-3214. 41 Hu E J, Shen Y L,Wallis P, et al. LoRA: low-rank adaptation of large language models[C/OL]// Proceedings of theInternational Conference on Learning Representations.Appleton: ICLR, 2022. https://iclr.cc/virtual/2022/poster/6319. 42 Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association of Computational Linguistics, 2002: 311-318. 43 Lin C Y. ROUGE: a package for automatic evaluation of summaries[C]// Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. Stroudsburg: Association of Computational Linguistics, 2004: 74-81. 44 Zhang T Y, Kishore V, Wu F, et al. BERTScore: evaluating text generation with BERT[C/OL]// Proceedings of theInternational Conference on Learning Representations. Appleton: ICLR,2020. https://iclr.cc/virtual_2020/poster_SkeHuCVFDr.html. 45 Che W, Li Z, Liu T. LTP: a Chinese language technology platform[C]// Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Stroudsburg: Association of Computational Linguistics, 2010: 13-16. 责任编辑 魏瑞斌) |
|
|
|